Adaptive Sampling and the Autonomous Ocean Sampling Network: Bringing Data Together With Skill Lev Shulman, University of New Orleans Mentors: Paul Chandler, Jim Bellingham, Hans Thomas Summer 2003 Keywords: AOSN, Skill Metrics and Assessment, Graphic User Interface, Data Visualization ABSTRACT The AOSN(Autonomous Ocean Sampling Network) brings together sophisticated modern robotic vehicles with advanced ocean prediction models to improve our ability to predict the ocean. The operational system includes data collection by smart and adaptive platforms and sensors that relay information to a shore in near real-time (hours) where it is assimilated into numerical models that help visualize the four dimensional fields and predict future conditions. An essential part of the AOSN experiment is measuring skill. During the experiment, the models predict the future behavior of the ocean, and based on its output, the observation data assets are deployed to the area of interest in a coordinated manner. Because AOSN utilizes two major model prediction systems (HOPS, ROMS), and many different data collection vehicles (Scripps glider, WHOI gliders, Seawiffs, Dorado AUV, etc.), three different data comparisons need to be presented to measure skill: model vs. model, model vs. observation, and observation vs. observation. The primary goal of my project was to develop a Graphic User Interface(GUI) for assessing and visualizing skill pertaining to these kinds of comparisons. INTRODUCTION Prior to the field experiment, namely the collection of observational data, a number of issues needed to be resolved. Because data is collected from a variety of instruments and vehicles, and therefore has many different formats and flavors, a common data format and data conversion was needed. The idea was to have the observation data shipped to the modelers to have it interpolated. Once the data is interpolated and represented in the same domain and format, coherent comparisons can be made. The field experiment would bring forth copious amounts of observation data in a large variety of formats. These formats would need to be understood and properly documented and archived in a uniform manner. The completion of the tasks listed above was necessary to provide the underlying architecture and implementation of the skill assessment Grahpic User Interface. 1
MATERIALS AND METHODS LAYOUT OF THE SKILL ASSESSMENT GUI The layout of the GUI needed to be simple, but versatile. After some consideration and revision, a layout was decided. It consisted of the following: Three major axes: one for the two datasets plotted (model vs. model, model vs. observation, observation vs. observation), and the third for the comparison between the two datasets. Two smaller axes on the bottom to display the Vertical RMS Error and the Vertical TCC Error Dropdown windows to allow the user to switch between parameters(temperature, salinity, pressure, etc.) of a profile or dive A user click button that would display the actual data in a matrix or some other appropriate form. Menu items to allow the user to save plots in various graphics formats(.jpg, giff, etc.). A user option to choose window of data to visualize and compare. Ex: allow the user to view WHOI Glider and HOPS temperatures at a depth between 5 and 30 m. All of the AOSN visualization products were chosen to be programmed and produced using Matlab software, which is optimal for producing surface and three dimensional interpolated plots, and has an extensive, user-friendly GUI library (GUIDE). Below is an example of the Skill Assessment GUI Layout: 2
DATA MINING Although the layout was resolved, the architecture to feed data into the GUI was not in place. Again, data came in a large variety of formats: ASCII, binary, Matlab.mat files, NetCDF files, HDF file, etc. The data was dumped into directories on the server \\Polarbear, named after the asset collecting the data. For example, the WHOI Gliders would send back data as ASCII (text) files with the extension.prf ; these files would be stored in the \\Polarbear\aosnII\ \WHOIGlider\ directory. We took on the long and strenuous task of documenting and understanding what the data represented. Afterwards software routines were implemented to read this raw data into Matlab space. At this point, the issue of data pedigree became imperative. How should the data be archived? The WHOI glider raw data files, for example, would capture a single profile, and after a couple of weeks into the field experiment, the WHOIGlider directory on \\Polarbear would fill up with over 1000 files. The SIOGlider data, which essentially represented the same type of data, was archived in single files per glider, and each file had all of the profiles data collected appended to itself, creating only about 7 files in its SIOGlider directory on \\Polarbear. Some sort of consistency in storing and organizing, as well as proper version control and history was needed. Data management design is a challenging problem, and a good design for the flow and pedigree of data became a priority. Time was critical for many of AOSN collaborators and many users tended to read raw data from the \\Polarbear server for their own purposes, using their own software. It became clear that the conversion and archival of data to a common, AOSN specific interface was needed. When designing a system, there are several reasons for the abstraction of data into a common format. The A in AOSN stands for autonomous, and once all of the software is completed, the AOSN network should function in an autonomous fashion. Therefore it was important to keep in mind not only the short term objectives of needed software, but the long term objectives as well. By utilizing a single common data format, the flow and design of data is greatly simplified, and the implementation and design of autonomy throughout the system and its interaction with the data becomes a lot more intelligible. Robustness and modularity, coupled with the separation of data and process, can greatly reduce the potential error of a system, as well as substantially increase system efficiency. Consider the following data flow diagram: Figure 1 3
The task of the flow in Figure 1 is to get a window of data from an asset s directory on \\Polarbear\ to the Skill Metrics and Assessment GUI, or any other application. The ovals represent processes of reading in and extracting a window of data from the raw data format. Because each asset s raw data is differently formatted and organized, the processes are all different, requiring the implementation of a different subroutine for each asset. If there is a change or modification in one of the asset s data formats, then there has to be an accommodating change in the processes and application as well. Therefore, there is no modularity. Data abstraction is not present in this type of design, although most of the data does have the common time, depth, latitude, and longitude column data format.. Although, in actuality, Figure 1 represents the exact type of data to application exchange that was in place, this data flow design is inefficient, and can be improved. Consider the following diagram: Figure 2 In this particular design, the raw data from the assets has previously been converted to a common format. There is no need to read in the raw data pertaining to each asset, because it is all in the same format. The only process that needs to be implemented to get from the data to the application is the process of extracting a window of data. The process is the same for each asset s data. Whit this design, modularity and the robustness of code is optimal, because each particular element of the flow diagram stands alone and does not depend on another s specifics. There is also a lot less code to be written and maintained, reducing the potential error of the system. This is the design that we decided to integrate to. NETCDF The data would was converted to NetCDF, an accepted standard in the scientific community. NetCDF (network Common Data Form) is an interface for array-oriented data access and a library that provides an implementation of the interface. The netcdf library also defines a machine-independent format for representing scientific data. Together, the interface, library, and format support the creation, access, and sharing of scientific data, and is optimal for storing data in time, depth, latitude, and longitude format (UNIDATA, web). Matlab provides a comprehensive set of tools for dealing with 4
NetCDF. I also compiled some Matlab/NetCDF routines for AOSN specific purposes. We set out on the task to convert each observation asset s data to a new common AOSN NetCDF format, consisting of time, depth, latitude, longitude, and data columns stored in.nc binary files. DATA EXTRACTION TOOL An efficient tool was needed to extract a window of data based on a parameter of data from an asset s data. For example, to feed a visualization tool, all WHOI Glider data spanning from June 27 to August 2 nd, at 30 m depths, may have been needed. I set out to write this kind of tool using Matlab s object oriented capabilities. Utilizing object oriented programming techniques can improve modularity and reusability of code, as well as provide a good level of abstraction. The tool was completed and allowed the user to extract a window of data from any parameter within the data, be it time, depth, temperature, salinity, etc. The tool is completely object oriented, providing only one subroutine to extract data from any asset s NetCDF data, avoiding having a number of subroutines specific to each asset. The tool was documented and used to feed visualization products like the plots below: 5
AUTOMATED CONVERSION To utilize the type of data flow design introduced in Figure 2, conversion to the AOSN NetCDF format has to be automated, and data has to be updated frequently. I created a set of tools to convert a number of assets automatically, allowing the user to specify the time interval of each conversion. The conversion can be done in two ways: as a Matlab process a Matlab data conversion program repeats itself at a time interval specified by the user, or as a server process repeating itself essentially a cron job that is in a loop. The second option may be more inefficient because Matlab has to be started up each time a new server process runs. RESULTS (Normal, Times New Roman, 12 pt, bold) The following sets of tools were completed: Automated AOSN NetCDF data conversion tools o The Matlab NetCDF toolbox was modified to better suit AOSN purposes Skill Metrics and Assessment GUI Layout Comprehensive Set of Data Window Extraction tools for a number of the assets WHOI Glider and Scripps Glider visualization products Below is a diagram representing the flow of data for potential visualization products. The top is data stored on \\Polarbear\ server, and the bottom the visualization product. 6
DOCUMENTATION The following is a list of the various function and subroutines that were implemented and documented: \\polarbear\aosnii\software\observation_data_software\dorado1 o \\polarbear\aosnii\software\observation_data_software\dorado1\dorado12 netcdf.m o \\polarbear\aosnii\software\observation_data_software\dorado1\dorado_d ata.m o \\polarbear\aosnii\software\observation_data_software\dorado1\@dorado 1\*.m \\polarbear\aosnii\software\observation_data_software\ptsurunderway o \\polarbear\aosnii\software\observation_data_software\ptsurunderway\pt SurUnderway_data.m o \\polarbear\aosnii\software\observation_data_software\ptsurunderway\@ ptsurunderway\*.m \\polarbear\aosnii\software\observation_data_software\sioglider o \\polarbear\aosnii\software\observation_data_software\sioglider\all_sio2 nc.m o \\polarbear\aosnii\software\observation_data_software\sioglider\get_sio glider_data.m o \\polarbear\aosnii\software\observation_data_software\sioglider\qc_sio_ glider.m o \\polarbear\aosnii\software\observation_data_software\sioglider\sioglide r2netcdf.m o \\polarbear\aosnii\software\observation_data_software\sioglider\sioglide r_data.m o \\polarbear\aosnii\software\observation_data_software\sioglider\@siogli der\*.m o \\polarbear\aosnii\software\observation_data_software\sioglider\sioglide r2netcdf.sh \\polarbear\aosnii\software\observation_data_software\whoiglider o \\polarbear\aosnii\software\observation_data_software\whoiglider\all_w hoi_prfs2nc.m o \\polarbear\aosnii\software\observation_data_software\whoiglider\get_m odifiedwhoi_time.m o \\polarbear\aosnii\software\observation_data_software\whoiglider\get_w hoiglider_data.m o \\polarbear\aosnii\software\observation_data_software\whoiglider\qc_w hoi_glider.m o \\polarbear\aosnii\software\observation_data_software\whoiglider\whoi_ date_str2frac_day.m o \\polarbear\aosnii\software\observation_data_software\whoiglider\whoig lider2netcdf.m 7
o \\polarbear\aosnii\software\observation_data_software\whoiglider\whoig lider2netcdf.sh o \\polarbear\aosnii\software\observation_data_software\whoiglider\whoig lider_data.m \\polarbear\aosnii\software\observation_data_software\get_aosn_asset_data. m \\polarbear\aosnii\software\observation_data_software\visualization\time_vs_ depth.m \\polarbear\aosnii\software\aosn_utilities\ o \\polarbear\aosnii\software\aosn_utilities\aosn_mat2str.m o \\polarbear\aosnii\software\aosn_utilities\aosn_nc2cdl.m o \\polarbear\aosnii\software\aosn_utilities\automate_asset_conversion.m o \\polarbear\aosnii\software\aosn_utilities\depth_.m o \\polarbear\aosnii\software\aosn_utilities\get_interp_3d.m o \\polarbear\aosnii\software\aosn_utilities\nc2mat_struct.m o \\polarbear\aosnii\software\aosn_utilities\nc2mfile.m o \\polarbear\aosnii\software\aosn_utilities\pad_num_str.m \\polarbear\aosnii\software\aosn_utilities\setup\setup_mammoth_path.m \\polarbear\aosnii\software\aosn_utilities\setup\lev\setup_win_path.m \\polarbear\aosnii\software\aosn_utilities\time_conversion_functions o \\polarbear\aosnii\software\aosn_utilities\time_conversion_functions\frac_ day2str.m o \\polarbear\aosnii\software\aosn_utilities\time_conversion_functions\fracti onal_day2month_day_time.m o \\polarbear\aosnii\software\aosn_utilities\time_conversion_functions\get_f ractional_day.m o \\polarbear\aosnii\software\aosn_utilities\time_conversion_functions\unix secs2frac_day.m DISCUSSION (Heading 3, Times New Roman, 12 pt, bold) Competing with time was perhaps the most challenging task of the project. A sophisticated and efficient design of a system as complex and demanding as the AOSN data flow network is a rather daunting task. A good initial design is much more favorable that a poor but promptly completed design in the long run. CONCLUSIONS/RECOMMENDATIONS (Heading 3, Times New Roman, 12 pt, bold) Although Matlab s object oriented capabilities are rather new, I highly recommend utilizing the object oriented approach. It creates very manageable and modular code, that is easier to maintain in the long run. During the experiment, CVS (Concurrent Versions System) software was considered. AOSN data goes through many phases and stages, and is fed into many 8
different applications. A proper way to archive, store, organize, and track, the changes in the data is important. CVS may be the tool needed. ACKNOWLEDGEMENTS Paul Chandler, Hans Thomas, Jim Bellingham, the entire AOSN team, and a great thanks to George Matsumoto. References: UNIDATA. http://www.unidata.ucar.edu/packages/netcdf/ 9