OECD Forecast Entry Forecast Entry : Douglas Paterson Economics Department, OECD Introduction The OECD Economics Department publishes every six months the OECD Economic Outlook, a two year projection for output, employment, prices, trade and current balances for each of the OECD member countries. The data-entry software used by Country desk economists and International trade specialists during this exercise is actually a part of a larger package which includes the OECD INTERLINK model. This macroeconomic model is a representation of the world economy, which integrates in a coherent and globallyconsistent manner semi-annual models for each OECD member country, smaller trade and balance of payments models for six non-oecd country zones, international trade flows, exchange rates and financial flows. For performance reasons, the modelling part of this package was ported some time ago to a UNIX environment; large scale macro-policy analysis work is now carried out almost exclusively on a Hewlett Packard series K200 UNIX server, running HP-UX 10.20. The data-entry side of the software - Forecast Entry - still runs today on a UNISYS A16 mainframe. The prototype application described here is our first attempt to validate a new architecture for Forecast Entry (FE), based around the use of FAME as a back-end analytic server, combined with JAVA as a front-end tool. This report is divided into three sections: the first section describes in general terms a new architecture for FE and the choice of tools used in it's construction ; the second presents a prototype built using these tools and tested on a typical Economics Department PC. The third section poses a number of implementation and methodological questions Forecast Entry: general architecture and choice of tools The basic idea is to put not only data but as much of the application structure and "intelligence" into the back-end. For example, FE includes for each country a set of data transformations or identities, essentially a set of formulae which define an accounting framework for that country. For example, Government Fixed Capital Formation (IGV) for the United States is actually disaggregated into Federal and Local Government categories (IGFV and IGLV). The US desk economist prefers to enter projections for these individual categories and have IGV determined as the sum of the two. In this set-up, IGFV and IGLV are described as forecast variables and IGV is an identity variable. In our new architecture, the set of all the formulae for the identity variables in FE should be stored and evaluate in the back-end through the use of stored procedures (user defined procedures in FAME): when the user alters data for a forecast series displayed using a spreadsheet-like interface on the front-end, that data would be transferred to the back-end, and a stored procedure called to Proceedings of the Fourth International FAME Users Conference 1
Forecast Entry OECD update the identity variables dependent on that data. The front-end would then re-display the spreadsheet with the updated data. Back-end FAME was an obvious candidate for the back-end software. Experience within the Department and elsewhere suggested that FAME's analytical capabilities are probably sufficient for the task and programs written in other languages can access a subset of FAME functionality and FAME databases through the C "Host Language Interface" (C HLI). Aside from having this basic functionality, FAME is of particular interest in the context of the Economics Department: FE has close links with the Economics Department Analytic Database (ADB), which is being migrated progressively to FAME. Housing FE in FAME should simplify the interfaces between FE and the ADB, and provide scope for more simplified management of the two systems. Increasingly, ad hoc demands for tables and graphs using FE data are being processed outside of the FE system using FAME. The FAME procedures developed for these purposes could be integrated into FE if their use became more routine in nature. FAME supplies an EXCEL add-in called the "FAME Populator", which provides read-only access to FAME databases. With the FE databank stored in FAME, users would have very easy access to data using EXCEL, which is part of a standard set of OECD desktop software supplied to all users. The number of skilled FAME users is growing rapidly in the Economics Department. The use of FAME for FE would widen the scope for interoperability and transferability of staff. FAME is a "standard" OECD software package, which means a high level of support is provided by ITN (OECD s Information, Technology and Network Services Directorate). The picture is not all rosy however!: FAME s current ODBC (Open Database Connectivity) interface is read only. Programs which need to read or write to FAME databases have to be developed specifically for FAME using the FAME C HLI. The current FAME OLE (Object Linking and Embedding) interface "exposes" only a limited amount of FAME functionality; in particular it is not possible to write back to FAME using this interface. Here again the C HLI must be used to do this. FAME provides no built-in tools for co-ordinating and managing multiple user updating of FAME databases. This is important during the forecasting rounds, which work under very tight time constraints; Country desk and International Trade economists need practically instant access to each others data in order to work within stringent deadlines. The lack of native matrix objects in FAME complicates somewhat storage of weighting-matrix type information. A number of matrices of this sort 2 Proceedings of the Fourth International FAME Users Conference
OECD Forecast Entry are stored in FE databanks, and are used in calculations of international variables, e.g. effective exchange rates. The experience of other Organisations such as the Federal Reserves Board in Washington and Statistics Canada suggests that the question of controlling and co-ordinating multipleuser updating is certainly not impossible. Regarding matrices, techniques have already been developed within the Economics Department to treat matrix operations using FAME procedures. The lack of open database standards and a read-only OLE interface are more annoying: FAME does provide however "lower level" tools - the C HLI - for accomplishing the necessary tasks. Front-end Having decided provisionally on FAME for the back-end, this left the question of how to develop a visual interface or front-end. An extension to the FAME language called FAME Windows exists for developing FAME-specific front-ends. This extension was first developed for the UNIX version of FAME, and later ported to Windows NT 3.51 and Windows 3.1. For a number of reasons FAME Windows was excluded as the front-end tool: Most important of these was that it did not exist for Windows NT 4.0 when the project began. FAME Windows interfaces are relatively heavy to develop and there seemed to be some question about further development of this product by FAME in the longer-term. This meant looking for a separate visual front-end tool which could interface with the FAME C HLI. Of the various tools available, JAVA seemed a promising candidate at the outset. So-called "native methods" can be used in JAVA to interface with the C HLI. More importantly, JAVA by it's very nature is built to be portable; the problem here was that of the platform for FE. FAME runs on both PC and UNIX platforms, and we were not sure at the outset that performance on a typical Economics Department PC would be sufficient. We wanted to retain the possibility, at least in theory, to have a front-end that could be made to function on either a PC or the UNIX server. These considerations resulted in an initial choice of JAVA for the front-end. This choice, made at the end of 1996, did constitute a calculated risk however, and one that needed to be evaluated after the first few months of the project. Development of a prototype Objectives The approach outlined above seemed a priori feasible. A decision was taken to develop a prototype with the objective of answering the following questions: Can JAVA be used to develop a robust, rapid and stable visual interface to FAME, using the FAME C HLI and JAVA native methods? Is it possible to execute FAME commands and run FAME procedures through this interface? That is, the visual front-end must not only transfer data backwards and forwards between the application and the FAME back-end, but must also be able to execute FAME procedures on the backend. Is the resulting performance, with all software running on the client machine, acceptable on a target Economics Department PC? (60-75 MHZ Pentium with 32 MB of RAM running Windows NT 4.0) Proceedings of the Fourth International FAME Users Conference 3
Forecast Entry OECD Development environment and tools Development of the prototype was carried out using a 120 MHZ Dell Pentium (model GXL5100) with 64 MB of RAM running Windows NT 4.0. The JAVA front-end was built mostly within the Visual Cafe Pro JAVA development environment from Symantec Corporation, which is based on the 1.0.2 release of JAVA. It was clear that the front-end would have to include some sort of spreadsheet-like interface, preferably with tab sheets for displaying and editing data in a number of separate tables. The available JAVA spreadsheet toolkits were reviewed, and the Microline Toolkit chosen. This product is now sold by Neuron Data (see www.neurondata.com). A trial version of the product was available from the Internet site. A fully featured version costs $399US per developer, with no run-time fees. The Microsoft Visual C++ 4.00 compiler was used to build the library of functions around the FAME C HLI. Functional description of prototype JAVA classes A small prototype set of JAVA classes was developed to represent FAME objects such as FAME databases, namelists, case series, time series and the FAME server session (back-end FAME session). These classes are and probably will remain quite primitive. The classes include "native methods", which in turn call various functions in the FAME C HLI to read and write data to FAME databases and to execute FAME commands through the FAME server session. The native methods are combined together in a Windows DLL (dynamic link library.) Structural information An important principal established at the outset was that, as far as possible, all structural information or "intelligence" about the FE application should be stored in the back-end, and thus in FAME itself. What this means is that a FAME database, probably the FE database itself, should contain all the structural information about the application in addition to the numbers. For example, as a matter of convenience, the accounts for each country are separated into functional groupings, Expenditure, Households, Government, etc.. Each of these accounts can be visualised as a separate worksheet, and correspond to a set of associated forecast and identity variables. The list of variable names for each FE worksheet or SHOW table for each country would be stored in FAME as a string case series. This should reinforce the overall coherence of the application, and ensure a very clean, easily documented structure. For instance, changing a SHOW table for a country should only involve modifying some list of variable names in a FAME database. None of this information should be "hard-coded" in the JAVA front-end. Database Procedures were developed to extract all the FE data, model add-factors, and parameters for a single country from the current production system databank in text form, and to convert the resulting ASCII file into a valid FAME input file format. A certain amount of metadata attached to each variable in FE is also extracted. A FAME database for the United States was constructed in this way. To this database, a number of FAME data objects were added to represent the structural information described above, for example, the set of available countries, available SHOW tables for the United States, security information, etc. Data transformations Central to the operation of FE is the set of data transformations identities for each country. These calculations allow a complete data set for a country to be built from data for the forecast variables. For the United States, this consists of approximately 280 separate calculations of parameters and identity series. These equations 4 Proceedings of the Fourth International FAME Users Conference
OECD Forecast Entry need to be resolved on a period-by-period basis, but there is no circularity in the system within each period. Two very different approaches to converting these equations into FAME were tried. Their relative strengths and weaknesses are analysed below. How the prototype works Figure 1. Forecast Entry: choice of country and access mode Initialisation The mode of operation is for the JAVA front-end to fire up and open what is essentially a FAME session in the background. This FAME server session has a temporary FAME work database associated with it. The FAME C HLI provides read and write access to this temporary database (as well as to any closed permanent database.) Using this mechanism, the front-end can read and write to the work database, in addition to executing FAME commands which can access the same data. Immediately after opening the FAME server session, a certain number of FAME commands are executed to initialise the associated work database with data from the FE databank. The front-end also opens a readonly connection to the databank, reads a list of country names, and security information, and presents this list to the user (see Figure 1.) Proceedings of the Fourth International FAME Users Conference 5
Forecast Entry OECD Figure 2 Forecast Entry: Spreadsheet interface for data editing Interaction with user On selecting a country, the front-end reads a certain amount of country-specific information from the FAME database, representing the set of all SHOW tables available for that country. It then reads all the corresponding time series and displays them in a series of separate tabbed worksheets, for the time being in level form only. The resulting window can be resized sideways to fit more data on the screen. The horizontal scrollbar allows the user to look at data over different periods. Identity and forecast variables are identified by separate colours. (see Figure 2.) 6 Proceedings of the Fourth International FAME Users Conference
OECD Forecast Entry Figure 3 Identity variables automatically calculated on data cell validation Execution of identities Whenever a user modifies a data cell in the worksheet, data for the corresponding FAME time series is updated in the temporary FAME work database. The identities are run (see below) using this modified data, and the currently displayed SHOW table worksheet updated with data for the identity-type variables (see Figure 3.). Proceedings of the Fourth International FAME Users Conference 7
Forecast Entry OECD Figure 4 FAME server dialog started from the Window menu FAME server dialog At any time, a FAME server dialog can be called up from the Window menu. This gives direct access to the back-end FAME session and work database and is useful for debugging purposes. Commands entered here are executed immediately by the FAME server session. Output for the time being goes to the standard FAME output window. Figure 4 displays the set of formulae linking the forecast variable CPV (Volume of Private Final Consumption expenditure), through identity variables FDDV and TDDV to GDPV (Gross Domestic Product) for the United States. The rather complicated formula for GDPV, not all of which is shown, generates a projection for GDPV depending in part on the growth rate of CPV. Prototype evaluation: response time Performance of the prototype was a crucial aspect of the evaluation. Ideally, the identities should be run each time a worksheet cell is updated. This is not in fact the case with FE in our current mainframe production system, where the user enters a whole line of data at a time. The identities are run only when the user finally types a carriage return at the end of the line. The front-end could be designed to behave in the same way by including a "submit" 8 Proceedings of the Fourth International FAME Users Conference
OECD Forecast Entry button, but this would be second-best (it would be possible of course to include both possibilities through menu options.) Considerable effort was made therefore to try to cut the response time to a minimum. Tests were carried out both the development machine and a middle of the range PC (see Table 1.) JAVA An initial concern with JAVA was that it might be too slow, compared with a fully compiled language like C or C++. This proved not to be the case in the prototype. The Just In Time compiler (JIT) included with Visual Cafe Pro, which essentially fully compiles the JAVA "byte code" on the fly, is clearly very effective. This might be a concern however if, for performance reasons in the rest of the application, it proved necessary to run everything in the UNIX environment. In the response times reported in Table 1, the overhead involved in the front-end is relatively constant and of the order of 0.2 or 0.3 seconds. Most of the time is spent in the FAME back-end. FAME Most of the time in FAME is spent evaluating the identities. Basically two techniques are possible, involving either FAME formulae or FAME models. Significantly different results are obtained (see Table 1 ) Aside from the speed aspect, transition costs, overall flexibility and maintainability constitute other factors which must also be taken into consideration. Table 2 summarises most of these aspects, which are explored in more detail in the following paragraphs. FAME formulae One possibility is to code the identities as FAME formulae. Formulae are stored in FAME databases as simple text. Analysis and evaluation of this text string is carried out only when the formula is used in some way, for example to update the values of a data series over some period. For most of the FE identities, conversion into FAME formula is relatively straightforward. It is a little more difficult for stock or rate of growth type variables, which require the use of FAME functions, and very complex for some autorecursive equations. The very particular form of the GDPV forecast identity for the United States also proved difficult to code as a formula. Formulae response times Evaluation of a small number of formulae can be very fast, and of course, for the SHOW tables, only those variables actually displayed to the user at any one time need to be executed. Table 1 shows these times for the prototype. Evaluating large numbers of formulae can be very slow however, and herein lies the principal default of this approach: For example, executing the complete set of formulae for the United States over a 20 semester period takes 84 seconds on the target PC! FAME models The other possibility is to code the identities as a FAME model. Coding the identities in this way is a heavy process, requiring for example declaration of all RHS coefficients not actually determined by the identities. The forecast rules which we use occasionally (for example to make a variable grow over the forecast period as the last known historical ratio of some other variable) proved impossible to code in a FAME model: the historical ratios used in these rules need to be determined outside of the model. Models need to be precompiled by FAME before they can be used. "Activation" of the model, which implicitly or explicitly involves fixing the period for running the model afterwards, generates automatically a separate temporary database which is not directly accessible via the C HLI. Changing the period for execution of the model means recreating this temporary database. Proceedings of the Fourth International FAME Users Conference 9
Forecast Entry OECD Model response times The big advantage of models overall is in execution time. As can be seen from the Table 1, using the model over a 20 semester period to update any SHOW table is slower than evaluating a small number of formulae over the same period, but in the case of the model, all the identities are being evaluated, (although only data for the subset of time series actually being displayed is transferred to the JAVA front-end.) At certain times during a FE session, this would have to be carried out - for instance when updating a permanent database or file or when printing summary tables. This takes under 2 seconds for the model and roughly 84 seconds using formulae! Table 1: Forecast Entry prototype response times Identities coded as FAME formulae Development machine: Dell GXL5100 (100MHZ Pentium) Target machine: Siemens 5HPCI (60 MHZ Pentium) Identities coded as FAME model Development machine: Dell GXL5100 (100MHZ Pentium) Target machine: Siemens 5HPCI (60 MHZ Pentium) SHOW1 0.42 0.75 1.12 1.81 SHOW3 0.89 1.59 1.04 1.77 All identities 49* 84* 1.1* 1.46* Notes 1. Time in seconds from validation of spreadsheet cell to redisplay of table. 2. Period for calculation: 89S1 to 98S2 (20 semesters). * Execution of an appropriate FAME user procedure only, (i.e. time does not include transfer to JAVA.) 280 identities in all evaluated. Questions for future development Formulae vs. models A choice needs to be made between these two very different ways to code the identities. Table 2 compares and contrasts the relative advantages and disadvantages of the two methods. 10 Proceedings of the Fourth International FAME Users Conference
OECD Forecast Entry Table 2: Formulae vs. FAME model representation of identities Formulae Models Translation of Interlink identities into FAME format Storage Relatively straightforward for most identities; FAME functions necessary for stock and rate of growth variables; complex ad hoc techniques for some variables. Stored as FAME objects in FAME databases along with rest of the data Some forecasting rules cannot be coded directly in model; country specific initialisation necessary; Heavier translation cost than for formulae. Text version of model is "precompiled" into a model file, much like a FAME procedure. Stored separately from a FAME database. Ease of modification Execution period Speed of execution Formulae can be modified directly from the FAME command line. Very flexible. Can be controlled implicitly by the current FAME date or explicitly at the command level. Very rapid for a small number of formulae. Extremely slow for the complete set of identities. Model text file must be edited and recompiled. Relatively inflexible: Must "reactivate" the model, and re-initialise the temporary model database. Fact that execution is fast means that this is less of a problem. Not quite twice as slow as formulae for a typical SHOW table. Approximately 50 times as fast for complete set of identities. JAVA The JAVA front-end does not seem to introduce any speed bottleneck into the application, and if the experience of these first few months and the limited nature of the prototype is anything to go by, JAVA also seems to be reasonably robust. The one remaining reserve is that of stability, inherent in the fact that the language is evolving rapidly, with important improvements in functionality being incorporated in each new release. Development environments such as Visual Cafe Pro used in the development of the prototype necessarily lag behind in their support for the latest version of JAVA. This is a classic problem for developers who need to combine a number of software tools and a constant source of frustration and anguish! For the record, it has not resulted thus far in any problem that could not be resolved in one way or another. It should also be noted that portability is also affected by this rapid evolution. Versions of JAVA for HP-UX or other platforms become available months after a new release for Windows NT/95 or Solaris. OECD INTERLINK model and FE In the new architecture, the databank becomes a FAME database. We use INTERLINK on UNIX for modelling work. If we decide to proceed with development along the lines described above, the databank on the UNIX Proceedings of the Fourth International FAME Users Conference 11
Forecast Entry OECD platform should also be a FAME database with exactly the same format as that developed for FE on the PC. The interface between the UNIX version of INTERLINK and it's databank will have to be rewritten using the FAME CHLI or FORTRAN HLI to take account of this new database structure. 12 Proceedings of the Fourth International FAME Users Conference