CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler 1) Operating systems a) Windows b) Unix and Linux c) Macintosh 2) Data manipulation tools a) Text Editors b) Spreadsheets c) Visualization 3) Programming languages and environments a) Languages b) Environments c) Building a program This document is intended to be a review, and may also provide some helpful tips. Students in Computational Hydrology are expected to be fairly proficient computer users and have some familiarity with programming. To accommodate a wide variety of student backgrounds, there are generally no specific requirements with respect to hardware or software used in this class (there may be some models that everyone is required to use, but detailed instructions will be provided). Operating Systems An operating system manages the computer system s hardware and software. It provides a means for allocating resources (CPU, bandwidth, memory, etc.). It also provides an interface between applications and the system hardware, allowing software to operate with the software needing to know the hardware details. Operating systems are often grouped according to the type of machine: real-time control (e.g., computer numerical control or CNC machines), hand-held devices, desktop computers, workstations, servers, and supercomputers all have significantly different operating systems. The operating systems familiar to most people are Windows, Unix/Linux, and Macintosh. There are multiple flavors of each of these. Windows operating systems were initially based on MS-DOS, developed by Microsoft. The three most recent (and useful) versions are Windows NT, Windows 2000, and Windows XP, and are widely installed desktops, workstations, and servers. Each run on machines with Intel-type processors. The relatively easy to use graphical interface (compared with command-line input) made Windows popular with people with little computer experience. Unix is not a single operating system, rather it is a type of operating system with many variations. It was initially developed by AT&T, but released as open source code for academic use. The University of California at Berkeley improved upon the initial version, resulting in the Berkeley Standard Distribution (BSD) of Unix. Both free and proprietary versions exist. Original Unix systems were command-line driven, but
graphical interfaces are more recently available. Linux is a free variation of Unix, and has recently become very popular. Linux graphical interfaces are becoming more sophisticated (the KDE and Gnome interfaces are good), and are beginning to rival Windows system in terms of ease of use but with much better performance. The classic Macintosh OS is made by Apple Computer and runs on Motorola or IBM PowerPC processors. The newer Macintosh OS X is based on the BSD Unix, but is still proprietary. It reportedly combines the power and stability of Unix with a familiar graphical user interface. Data Manipulation Tools Both model input and output data need to be manipulated in various ways. Input to the models may include climate forcing (precipitation, temperature, etc.), watershed geometry (area, stream network, topography, etc.), and physical characteristics (soil type, vegetation, etc.). Output includes hydrographs, water levels (groundwater or surface water), and soil moisture. Data may represent one, two, or three dimensions. Many sophisticated commercial models have either integrated or third-party (external) data manipulation utilities. Hydrologic models integrated with Geographic Information Systems (GIS) are becoming more popular. GIS facilitates landscape analysis and provides an interface to common spatial data formats, thus is also useful as a general tool. There are numerous general tools available to handle input and output data, some of which are described in the following paragraphs. There are also many specific tools available to translate one format to another. It is often necessary to write your own! Text Editors Text editors are essential to hydrologic data manipulation. Most data available to the public are in text (ASCII) files (though the internal format varies, of course). Model output is often text as well. Models source code is stored in text files. While not always the most efficient method of manipulating data, text editors are often the most reliable method of examining the contents of a file. Note that word processors and spreadsheets can be used to edit text documents, but the files are difficult to correctly save in text format. There are several features desirable in a text editor: Ability to handle large files Search and replace functions Row/column indicator Support syntax highlighting (if used for coding) Column mode (very useful but not common) Every OS comes with a text editor. In Windows, notepad is the default text editor. Unix and Linux systems usually have a variety of text editors, including the old standby vi, and Emacs. TextEdit is available on Macintosh systems. Several third-party software packages exist that expand upon the capabilities of these editors. Notepad is particularly weak. An excellent, inexpensive Windows text editor is UltraEdit. This program is used as an example herein. Figure 1 is a screen capture of the UltraEdit (version 7) interface. Note the integrated directory structure (left window), multiple open files (three files are open,
Gen.inp is on top), and highlighted current line. On the bottom, note the location of the cursor. In Figure 2, a Fortran program is open with syntax highlighting in effect; note that keywords and symbols are colored. In Figure 3, a column is highlighted; in column mode, columns are manipulated as rows would be. Figure 1. UltraEdit screen capture.
Figure 2. UltraEdit with syntax highlighting. Figure 3. UltraEdit in column mode.
Spreadsheets Spreadsheets are used extensively in data manipulation, and are very useful for creating 2-D plots. Since most people routinely use Microsoft Excel, this is the only spreadsheet that will be mentioned here. Also, only select aspects pertinent to manipulating text files are covered. Excel will recognize a text file and offer options to parse the data into the spreadsheet cells. The two basic options are delimited or fixed width (Figure 4). The delimited option is useful when the values are not uniformly spaced, but are separated by spaces, tabs, or other characters. Fixed works well when the data form uniform columns, and does not require there to be spaces between the values. Figure 5 shows the text output options, where *.prn refers to space-delimited files, and *.txt to tab-delimited files. The number of spaces Excel puts between the values is related to how wide the columns are in the spreadsheet. Experimentation may be required to get the desired results, and creating input text files with specific internal format this way is not recommended. Figure 4. Excel text import options.
Figure 5. Excel text file export options. Visualization Due to the typically large amounts of data involved in hydrologic modeling, visualization is essential. Of course, methods of visualization vary widely in complexity, and a whole course could be devoted to this subject. Only a very brief discussion is included here. It is convenient to think of visualization in terms of data dimensions, i.e., one, two, three, and four. (Mental Exercise: think of specific hydrologic data that correspond to each of these dimensionalities.) Bar charts, histograms, and scatter plots, are common ways to explore one- and two-dimensional data. Contour and image maps are used for spatial data. Three- and four-dimensional data usually require the use of some means to compress the higher dimensions (using coloring or shading, for example) into two dimensions. An example of this is shown in Figure 6 is a color-filled contour map of the Paradise Creek watershed in Idaho. Other variables, such as areal extent of snow or precipitation depth, can be overlaid on such maps as well; the stream network and gage locations are shown on Figure 6. Animation can also be used, and is particularly useful for showing how spatial data (e.g., soil moisture) changes over time. There are some nice (large) animations located here (go to the bottom of the page for the animations).
Figure 6. Color-filled contour map of the Paradise Creek watershed, Idaho. Programming Languages and Environments A programming language is a formalized set of rules for syntax and semantics. Syntax refers to the ways in which the symbols that comprise the language are combined, and semantics defines the meaning of the strings of symbols. One way to characterize a programming language is by its level, where low-level languages are closer to what the computer can read directly (e.g., machine code and assembly language), and high-level languages are closer to what humans can read (e.g., Fortran and C++). To transform higher-level languages into something the computer can understand, programs are compiled into machine-readable code before the program is executed (run), or interpreted by the operating system or interpreter at run time. Scripting languages, such as Sed and Awk used in Unix/Linux environments, are often thought of as interpreted languages. While these are often useful for data manipulation and simple programs, they are generally not used in developing sophisticated hydrologic models. Compiled code usually is much faster. Each computer language is designed based on some philosophy or particular approach to programming. Three of the basic approaches are: Procedural Programming. Each module of a procedural code is made up of procedures, or sub-programs. Fortran was developed as a procedural language. Structured Programming. Typically this method refers to breaking the program into sub-programs, as with procedural programs, but with each subprogram having only one entry and exit mechanism. Pascal is a structured programming language.
Object-Oriented Programming. In this approach, emphasis is placed on the items (objects) manipulated as opposed to the actions taken by the computer. Objects are described by classes. A CAD drawing that describes a bicycle could be considered a class, and a bicycle an object. An object can be composed of both data and methods. C++ is a common object-oriented language, and many modern languages at least have some object-oriented capabilities. Programming environments are software packages and tools that facilitate code development. Text editors (with syntax highlighting), debuggers, compilers, code optimization utilities (profilers), and data visualization tools may be combined using one graphical user interface into an integrated development environment (IDE). If an integrated environment is not used, the individual tools used in program development comprise the programming environment. While there is some learning involved in using an IDE for the first time, they are very convenient once familiar. Figure 7 shows the Microsoft Developer Studio interface. This IDE integrates a text editor, compiler, debugger (being selected in Figure 7), and in later versions, a 2-D array visualizer. Debuggers provide a means to march through a program line by line and see the value of variables at each step. Figure 8 shows the Python IDLE interface. Python is an open source interpreted language. Another class of programming environment is inherent in mathematical software packages such as Matlab and Mathcad. These packages support scripting languages unique to each, and allow for symbolic representation of operations and data structures. They are integrated with libraries of optimized code. For example, if the inverse of a matrix A were required, one would simply write A -1 and the package would call the appropriate routine to compute the inverse without the user needing to know how it is done. It should be noted that libraries are also available for all popular compiled languages. The first steps in building a program have nothing to do with computers. First, the problem must be defined. What is the desired end result? Is the modeling being done to make a prediction or to learn something about the physical system? What are the important processes, with respect to the end result, of the physical system being simulated? What data are available? With these questions in mind, the equations that represent the physical system and appropriate solution techniques can be identified. Also, the general order of computations can be specified. The specific order may be a function of the computer language. With the problem, solution method and data well defined, the computer code can be written. It is generally a good idea to follow the general concept of modularity, both with respect to data and function. This applies to both the object-oriented and structured/procedural approaches. There are two general paths to executing your code, depending on if you are using a compiled or interpreted language: Compiled: source code compiler executable code run Interpreted: source code run (interpreted at runtime)
Figure 7. Microsoft Developer Studio. Figure 8. IDLE for use with the Python language.