Python Data Analysis Tool Kit - Outline Python language advantages Basic extension packages for data analysis and visualization Goals of python data analysis tool kit The Data class Data class based applications and analysis code interfaces efit.py example Visualization tools Software installation, availability, and documentation Thrust 1 YearEnd Rev 01/30/07 02:35 pm 1 DAG Talk 1
Python Language Advantages Easy to learn object oriented scripting language Simplified C/C++ constructs + automatic memory management = easy to learn and easy to write. Useful set of built in data types: numbers, lists, tuples, strings, dictionaries, and methods on these objects, e.g. regular expressions on strings; Class structure to add new data types Objects no longer referenced are cleared from memory Structured syntax (indentation) = easy to read Supported in emacs, vi Automatic documentation (pydoc): programmer written in line strings + automatic descriptions can be displayed in interpreter or written to html. Object oriented features (Class inheritance) provide a simple and efficient technique for the extension or customization of software. Modular structure: only import to memory features you need. Runs in interpreter or as script. Automatically semi-compiled Thrust 1 YearEnd Rev 01/30/07 02:35 pm 2 DAG Talk 2
Python Language Advantages Open source (GPL), free, and available for UNIX, LINUX, MacOS, Windows: www.python.org Mature, stable, strongly supported and widely used in open source community, e.g. used extensively in LINUX system software. Language can be easily extended with fast code written in a compiled language (shared libraries): C API (python features in C) Automatic code wrappers for C (SWIG) and FORTRAN (f2py) Python interpreter can be embedded in other languages Standard modules cover a broad range, e.g. parse XML, ftp, sockets, threads, regular expressions, process control, etc Import only the modules you need. Many second party extension modules, e.g. interface to MDSplus and relational databases, scientific analysis. Thrust 1 YearEnd Rev 01/30/07 02:35 pm 3 DAG Talk 3
Packages of Basic Numeric Extension Modules Numeric: array manipulation and linear algebra pymultipack: B-splines, integration, minimization, solution of non-linear equations, ordinary differential equations pysignaltools: signal convolution and filters pyfftw: forward and inverse FFTs in multi-dimensions with threading pyspecialfuncs, stats: special functions ScientificPython: Interpolation, least squares, vector and tensor analysis, parallel processing pyslatec: Full slatec FORTRAN library (auto-wrapped). Thrust 1 YearEnd Rev 01/30/07 02:35 pm 4 DAG Talk 4
Packages of Basic Data Archive Interface Modules pmds (pydatautils package): Base interface to MDSplus mdsconnect, mdsopen, mdsvalue, mdsput, mdsdisconnect mdsutils (Datautils package): simplified TCL interface, tree and node creator Ptdata (pydatautils package): Direct (d3lib) interface to D3D ptdata psycopg: Interface to open source postgresql relational database server. msdb (pydatautils package): Base interface to Microsoft SQL server Works with either Sybase Open Client or FreeTDS msdbtools(datautils package): copy tables, get columns in a table, Scientific.IO.NetCDF: netcdf file interface Scientific.IO.FortranFormat: FORTRAN format IO namelist_class (pynamelist package): interface to FORTRAN namelist files ( +,- overloaded) Thrust 1 YearEnd Rev 01/30/07 02:35 pm 5 DAG Talk 5
Packages of Graphics and Widget Modules Graphics pyppgplot : python interface to pgplot + pgxtal. pplot: simple plot methods on Data class instances pyscreens: pgplot based object oriented multi-window multi-graph plot builder BLT: graphics extension TCL/TK widget set pyd3tools (M.Wade) general DIII-D data plotting widget pygnuplot: interface to gnuplot graphics Gist: LLNL graphics package Widgets TCL/TK through Tkinter (low level) and PMW (high level) interface pygtk (Gnome desktop) LINUX: pygtk2 and pyglade (XML widget builder); Linux,HP-UX: pygtk1. Has graphics extension (not installed) pyqt (KDE desktop) not installed(linux only) has graphics extension wxpython (layer to various widget sets) not installed Thrust 1 YearEnd Rev 01/30/07 02:35 pm 6 DAG Talk 6
Python Data Analysis Tool Kit Goals and Approach Create a set of routines for data manipulation, that are at a high level but not specific to a particular analysis goal, interfaced to a simple scripting language, allowing a researcher to quickly build an analysis tool for a specific purpose. Higher level data processing elements (FFT,..) combined with medium level (array processing,...) and low level (iteration,...). Easy to use interfaces between data archives (MDS+, ) and data processing elements. Easy to use interfaces between standard analysis codes (EFIT, ONETWO, ) IO and data processing elements. Visualization tools Thrust 1 YearEnd Rev 01/30/07 02:35 pm 7 DAG Talk 7
Python Data Analysis Tool kit Data Class Instances of class Data are basic building blocks for analysis applications: defined in modules in pydatautils package data.py: highest level module >>> from data import * imports all Data class features defines higher level methods, e.g. signal.fft() data_init.py: instantiation functions, subclass and submodule of data.py >>> ne = Data( 'tsne_core', 122336 ) : 1) looks in table on postgresql server to see what MDS+ tree or PTDATA branch tsne_core is in, wild card characters will result in search and list of options, 2) reads the signal into ne.y, the error bars into ne.yerror, and axes into ne.x. 3) reads any subnodes (signal and atomic types) into substructures. data_base.py: basic arithmetic and algebraic functions on Data class objects, subclass Data and submodule of data.py Overloads +, -, *, /, **, %, algebraic functions (Sqrt, Log, Tan,..) and slices ([2,:]) error bars propagated time bases interpolated Math errors (zero divide) masked Thrust 1 YearEnd Rev 01/30/07 02:35 pm 8 DAG Talk 8
Data Class Methods Methods on Data class objects.cdfput() : Write instance to netcdf file.conj(): Complex conjugate.contour(): Generate contours on 2D instance.der(): First derivative.dump(): Write to ASCII file.fft(): Fast Fourier transform.fit(): Fit to some standard or user supplied function. A call method is created based on the fit..imag(): imaginary part of complex instance.int(): Integrate.interp_fun(): creates a interpolating call method on instance.inv_fft(): inverse fft.list(): lists name and ranges of values and axis Thrust 1 YearEnd Rev 01/30/07 02:35 pm 9 DAG Talk 9
Data Class Methods Methods on Data class objects.mdsput(): writes instance as signal node to MDS+ and substructures as subnodes including fit and/or spline attributes On instantiation fit and/or spline attributes read back in and call method created.newx(): Use different values for one of the independent vars.real(): Real part of complex instance.rebuild(): The operations leading to an instance are reapplied to recreate the instance for a different shot..save(): Write to a python cpickle file.skip(): Skip some points.smooth(): smooth based on several filter options or on a user defined response function.spline(): fixed or auto knot B-spline of variable order. Creates a call method for interpolation, derivatives, and integration Thrust 1 YearEnd Rev 01/30/07 02:35 pm 10 DAG Talk 10
Data Class Methods Methods on Data class objects.timing_domains(): Determine regions of continuous point spacing.tspline(): Splines with tension. Creates call method with interpolation, derivative, and integration.xslice(): Slice data based on x values rather than indices.copy(): Copy all attributes of instance to another (possible linkages of mutable attributes).deepcopy(): Full copy with linkages.shape(): Shape of y array Functions on Data class objects Join(): Join several identically shaped instances into a single instance with one extra dimension blend(): Blend two instances by combining along one x axis cdfget(): Read from a netcdf file Thrust 1 YearEnd Rev 01/30/07 02:35 pm 11 DAG Talk 11
Data Class Functions Functions on Data class objects dmdsput(): write a dictionary Data instances, strings, and arrays to MDS+; creates nodes, trees as needed listbuilds(): Lists the build attribute for all instances listdata(): List names and ranges for all Data instances listfiles(): Print a list of files with.data extensions (cpickle files) math_exceptions(): Define what to do with a math exception( /0 ) rebuild(): Rebuild several or all instance for a different shot restorebuilds(): Read the builds dictionary back from a file restoredata(): restore an instance from a cpickle file savebuilds(): Save the builds dictionary to a file savedata(): Save all instances to cpickle files Arccosh(), Arcsin(), Arcsinh(), Arctan(), Arctanh(),Conjugate(),Cos(),Cosh(), Exp(), Log(), Log10(), Sin(), Sinh(), Sqrt(), Tan(), Tanh() Thrust 1 YearEnd Rev 01/30/07 02:35 pm 12 DAG Talk 12
Example of Reading and Plotting MDSplus Data >>> from screens import *;from data import * >>> i = Data( 'ip', 98893 ).smooth(50.)/1.e6 t(ms) ip(amperes) >>> poh = i * Data( 'vloop', 98893 ) x0(ms) vloop(v) >>> ne = Data( 'densr0', 98893 ) ; ne1 = ne.rebuild( 98891 ) x0(ms) densr0(/m^3) x0(ms) densr0(/m^3) >>> psi = Data( 'psirz', 98893 ).xslice( (2,2000) ) x0(m) x1(m) x2(ms) psirz(vs/rad) >>> s = Screen() >>> s.ad( i ); s.ad( poh ) >>> s.ag() ; s.ad( ne ) ; s.ad( ne1 ) >>> s.aw() ; s.ad( psi, surface=1, color_table='heat' ) >>> s.ag( aspect = 'auto' ) ; s.ad( psi, n_contours = 20 ) >>> s 0: w0 -- 0: g0 -- 0: ( c0, i )-- 1: ( c1, poh )-- -- 1: g1 -- 0: ( c2, ne )-- 1: ( c3, ne1 )-- 1: w1 -- 0: g2 -- 0: ( s4, psi )-- -- 1: g3 -- 0: ( n7, psi )-- >>> s.pl() Thrust 1 YearEnd Rev 01/30/07 02:35 pm 13 DAG Talk 13
Data Class Based Higher Level Applications and Interfaces to other Codes Python interfaces to standard analysis codes (EFIT, ONETWO,...) define functions for interacting with the analysis codes IO integrated into the Data class analysis structure, and for running the analysis codes. IO may be extended, e.g. ONETWO data can be written to MDS+ Run functions in python interfaces to analysis codes and stand alone applications are controlled through tables on the postgresql server and activated through simple command lines, e.g. efit.py -r 122336. Pgaccess GUI to the postgresql server allows adjusting a large number of settings without putting them on the command line or creating a custom GUI widget for every application (also ODBC) Installed on all platforms pgadmin3: a better GUI to postgresql, installed on Linux only Permanent record of settings for a run. Run table entries described here: http://diii-d.gat.com/~osborne/python_d3d.html Thrust 1 YearEnd Rev 01/30/07 02:35 pm 14 DAG Talk 14
Data Class Based Applications and Interfaces to other Codes (pyd3d package) efit.py : Runs efit on Thomson scattering times, in snap mode, or off kfiles. Also sets up and runs a kinetic efit based on profiles generated by profile.py, does edge p' and j variation for stability analysis, reads and writes data to MDS+/EFIT files, reads EFIT data into Data structures elm.py : Determines ELM timing. Also calculates ELM energy loss from fast efit analysis. fasteq.py : Runs EFIT using fast magnetics data to look at ELM effects (energy loss). profiles.py : Computes full cross section profiles with good edge resolution for electrons and ions. Stores results in MDS+. onetwo.py : Runs onetwo and deals with its output. profdb.py : Make entry into ITER pedestal profile database. baloo.py : Runs baloo and deals with its output. Thrust 1 YearEnd Rev 01/30/07 02:35 pm 15 DAG Talk 15
Application program control table Pgaccess Thrust 1 YearEnd Rev 01/30/07 02:35 pm 16 DAG Talk 16
efit.py Application Example efit.py functions. setupdb_efit: Set up an entry into efit_runs table for auto EFIT runs efit_eqdsk.py functions (submodule of efit.py) convert_a(g): Converts and a(g)eqdsk to/from ASCII to bigendian binary get_a(g,m)dat : Reads all a(g,m)eqdsk data from MDS+ for a given shot into a dictionary of Data class objects. read_a(g,m)files : Reads all a(g,m)eqdsk files in a given directory (./shot12345) into a dictionary of Data class objects. write_k_from_mds : Read kfile data from MDS+ and write it to a file. Kfile data is only available in MDS+ for EFIT MDS+ data written with efit.py write_g_from_mds : Read geqdsk data from MDS+ and write it to a file. write_mds: write aeqdsk, geqdsk, meqdsk, kfile and snap file data to MDS+; update code_run database Thrust 1 YearEnd Rev 01/30/07 02:35 pm 17 DAG Talk 17
efit.py Application Example efit_run.py functions. (submodule of efit.py) autocheck_efit_runs : Periodically checks the progress of a set of EFIT runs distributed over the DIII-D computer cluster autorun_efit : run a series of EFITs based on efit_runs table entries, distributes runs across DIII-D computers check_efit_runs : Check the progress of EFIT distributed sub processes run_efit : Run EFIT in snap, kfile, or kfile creation mode distributing runs across the DIII-D cluster with load levelling. Can run in snap mode for Thomson scattering times. efit_kinetic.py functions and classes (submodule of efit.py) run_kinetic : Setup and run a free boundary kinetic EFIT based on profiles in MDS+ generated by profile.py and H-mode pedestal current density constrained to match the Sauter model (computed in efit_jsauter.py ). CBOOT can be optimized against magnetics CHISQ. Magnetics data is averaged over same intervals used in profiles Thrust 1 YearEnd Rev 01/30/07 02:35 pm 18 DAG Talk 18
efit.py Application Example efit_kinetic.py (submodule of efit.py) Kfile class: subclass of Namelist..kinetic: build a kinetic kfile from profile data in MDS+ efit_fluxav.py (submodule of efit.py) fluxav: Flux surface average a set of standard or user supplied functions psicont: Generate flux contours at normalized flux values efit_jsauter.py (submodule of efit.py) jsauter: computer Sauter bootstrap and fully relaxed Ohmic current density profile based on profiles in MDS+ and geqdsk parameters Thrust 1 YearEnd Rev 01/30/07 02:35 pm 19 DAG Talk 19
efit.py Example efit_kinetic.py vary_ped : Starting with a kinetic fit vary pedestal characteristic in a series of fixed boundary EFIT runs: pedestal pressure, current, collisionality, width, density and temperature separately. Used to map pedestal stability space. Thrust 1 YearEnd Rev 01/30/07 02:35 pm 20 DAG Talk 20
Widget Based Visualization Tools pyd3tools (M.Wade): TK/BLT general data plotter Profplot.py : Glade/GTK+ widget for plotting (pgplot) profile.py fits and related quantities. Pdb : GTK+ widget for plotting and fitting scalar database data (pgplot) Eqplot.py, Eqplot2.py(M.Makowski): Glade/GTK+ EFIT quantity plotter (pgplot) Sac (M.Makowski): GTK+ signal analysis widget (pgplot) Thrust 1 YearEnd Rev 01/30/07 02:35 pm 21 DAG Talk 21
Widget Based Visualization Tools pyd3tools (M.Wade): TK/BLD general data plotter Thrust 1 YearEnd Rev 01/30/07 02:35 pm 22 DAG Talk 22
Software Installation, Availability, and Documentation Python 2.5 and all packages currently installed on the DIII-D NSF disks and maintained for RedHat Linux E4, HP-UX 11.11. (Also at PPPL on RHEL3) Previously has been built for RHE3, Fedora1-3, Solaris6.2, and MACOS10 and OSF1. Package set in RPM form for RHEL4,5 and Fedora6,7 Signed RPMs installed in a YUM repository allowing automatic updates at https://diii-d.gat.com/~osborne/python/ Sources in CVS, CVSROOT=/f/python/cvspython (not pyd3tools) Group pyadmin has write access. Executables in /f/python/$ospath/bin Start with Python, Ipython(nicer interface) to set environment Documentation on packages and help on installation: https://diii-d.gat.com/~osborne/python/clickme.html Thrust 1 YearEnd Rev 01/30/07 02:35 pm 23 DAG Talk 23