DATA AND SOFTWARE INTEROPERABILITY WITH GAMS: A USER PERSPECTIVE



Similar documents
Gamma Distribution Fitting

An Overview Of Software For Convex Optimization. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Programming Languages

SAS Certificate Applied Statistics and SAS Programming

Expedite for Windows Software Development Kit Programming Guide

1. Overview. 2. Requirements

TOMLAB - For fast and robust largescale optimization in MATLAB

CPLEX Tutorial Handout

Matrix Multiplication

SUMAN DUVVURU STAT 567 PROJECT REPORT

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Mathematical Libraries on JUQUEEN. JSC Training Course

6.088 Intro to C/C++ Day 4: Object-oriented programming in C++ Eunsuk Kang and Jean Yang

IBM ILOG CPLEX Optimization Studio Getting Started with CPLEX. Version12Release4

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression

From the help desk: Demand system estimation

MATLAB Tutorial. Chapter 6. Writing and calling functions

Parallel and Distributed Computing Programming Assignment 1

Jonathan Worthington Scarborough Linux User Group

Building Applications Using Micro Focus COBOL

Data Mining: Algorithms and Applications Matrix Math Review

Corporate Defaults and Large Macroeconomic Shocks

Lecture 5: Singular Value Decomposition SVD (1)

Comparing SQL and NOSQL databases

Embedded Systems. Review of ANSI C Topics. A Review of ANSI C and Considerations for Embedded C Programming. Basic features of C

Computational Mathematics with Python

PGR Computing Programming Skills

Matrox Imaging White Paper

How to use PDFlib products with PHP

Mathematical Libraries and Application Software on JUROPA and JUQUEEN

GLM, insurance pricing & big data: paying attention to convergence issues.

Computational Mathematics with Python

SALEM COMMUNITY COLLEGE Carneys Point, New Jersey COURSE SYLLABUS COVER SHEET. Action Taken (Please Check One) New Course Initiated

Response variables assume only two values, say Y j = 1 or = 0, called success and failure (spam detection, credit scoring, contracting.

Python, C++ and SWIG

Advantages of PML as an iseries Web Development Language

Eliminate Memory Errors and Improve Program Stability

Linear Programming. March 14, 2014

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Package EstCRM. July 13, 2015

How to speed-up hard problem resolution using GLPK?

IBM WebSphere ILOG Rules for.net

6. Cholesky factorization

Mimer SQL. Programmer s Manual. Version 8.2 Copyright 2000 Mimer Information Technology AB

Imputing Missing Data using SAS

Statistics in Retail Finance. Chapter 6: Behavioural models

Javadoc like technical documentation for CAPRI

PEST - Beyond Basic Model Calibration. Presented by Jon Traum

Algorithm & Flowchart & Pseudo code. Staff Incharge: S.Sasirekha

Computational Mathematics with Python

Multivariate Logistic Regression

Similarity and Diagonalization. Similar Matrices

Introduction to Software Paradigms & Procedural Programming Paradigm

Design-Simulation-Optimization Package for a Generic 6-DOF Manipulator with a Spherical Wrist

Chapter 6: Sensitivity Analysis

ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Similar matrices and Jordan form

HPC Wales Skills Academy Course Catalogue 2015

Big-data Analytics: Challenges and Opportunities

Satisfying business needs while maintaining the

Distributionally Robust Optimization with ROME (part 2)

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

From the help desk: Swamy s random-coefficients model

Typical Linear Equation Set and Corresponding Matrices

SQL Server Array Library László Dobos, Alexander S. Szalay

Parameter Estimation for Bingham Models

Standard errors of marginal effects in the heteroskedastic probit model

Lecture 3: Linear methods for classification

Mixed Integer Linear Programming in R

Session 15 OF, Unpacking the Actuary's Technical Toolkit. Moderator: Albert Jeffrey Moore, ASA, MAAA

Sample Size Calculation for Longitudinal Studies

InterBase 6. API Guide. Borland/INPRISE. 100 Enterprise Way, Scotts Valley, CA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

OpenACC 2.0 and the PGI Accelerator Compilers

State of Stress at Point

The C Programming Language course syllabus associate level

Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation

Least-Squares Intersection of Lines

Getting off the ground when creating an RVM test-bench

An Oracle White Paper October Oracle Data Integrator 12c New Features Overview

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Programming Languages & Tools

JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Load Manager Administrator s Guide For other guides in this document set, go to the Document Center

Product Information CANape Option Simulink XCP Server

APPENDIX E THE ASSESSMENT PHASE OF THE DATA LIFE CYCLE

2. Advance Certificate Course in Information Technology

Temperature Control Loop Analyzer (TeCLA) Software

Regression III: Advanced Methods

Dynamical Systems Analysis II: Evaluating Stability, Eigenvalues

Transcription:

Amsterdam Optimization Modeling Group LLC Erwin Kalvelagen erwin@amsterdamoptimization.com DATA AND SOFTWARE INTEROPERABILITY WITH GAMS: A USER PERSPECTIVE

Modeling Languages Specialized Modeling Languages (GAMS, AMPL, ) are very good in what they do Efficient, compact representation of a model In a way that allows thinking about the complete model And that allows and encourages experiments and reformulations Through maintainable models Especially when they become really big Handle irregular and messy data well

API attraction: Σειρήν Many new users are seduced to program against solver API s Familiar environment, no new funny language to learn But for large, complex models this is really a step back Even when using higher level modelingfortified API s

Modeling Environments Environment Solver API Modeling API Modeling Language GAMS, AMPL IBM ILOG Cplex API Concert OPL MS Solver Solver Level API SFS OML Foundation GLPK API Mathprog X Some programmers love: Malloc, Linker errors, Compiler versions, Library versions, Makefiles, Windows vs. Unix, Debuggers, But for others this is time taken away from modeling

Also increased use of optimization in SAS (Improved OR module) MATLAB (Tomlab, Solver Interfaces) Mathematica, Maple (Numerical Routines) GAMS, AMPL, Excel (MSF, Ilog, ) R (Solver Interfaces, Automatic Differentiation)

Answer: Interoperability Make GAMS more attractive for programmers by allowing to use external software Make the user decide what to do in GAMS or in other environment Make data exchange as easy as possible Even for large data Safety: this is a spot where lots of things can and will go wrong

Flexibility Do not decide for user Data manipulation can be done in GAMS or Excel Computation can be done in GAMS or external software Allow these decisions to be made depending on the situation Skills Available software Suitability

E.g. Data handling Office GAMS Where to put functionality: In Date Source Environment or Modeling System AMPL OML This has also to do with procedural vs declarative and with data manipulation capabilities

GDX in practice It really works Binary, fast You can look at it Limitations: Cannot add records or symbols Eg: combine two gdxfiles GDX is immutable Not self contained wrtgams: Needs declarations Zero vs non-existent GAMS interface $load is dangerous Compile time vs execution time Execution time limits (each call separate gdx, cannot add set elements at execution time)

Example: USDA Farm database Imports 3 MDB + XLS file GDX file ± 5 million records (raw data) File Size (bytes) FARMLandWaterResourcesDraft.mdb 80,650,240 FARMResourcesProductionDraft.mdb 429,346,86 65 symbols GTAP54-NonLandIntensive.mdb 303,66,000 FIPS2&FAOCountries.xls 29,84 farm.gdx 9,8,434 farm.gdx (compressed) 52,924,664 Good, compact way to distribute and reuse large data sets (input or output) Aggregation easier in GAMS than in Access!

Example: USDA Reap Model Combine data from different sources

Conversion mdb-> gdx $onecho > cmd.txt I=FeedGrainsData.mdb X=FeedGrainsData.gdx q=select commodity from tblcommodities s=commodity q2=select attribute from tblattributes s2=attribute q3=select period from tbltimeperiods s3=period q4=select unit from tblunits s4=unit q5=select distinct(iif(isnull(isource),'blank',isource)) \ from tblfg_update where \ not isnull(year) s5=isource q6=select geo from tblgeography s6=geocode q7=select commodity,attribute,unit,iif(isnull(isource),'blank',isource),geocode,year,period,value \ from tblfg_update where \ not isnull(year) p7=feedgrains $offecho $call mdb2gms @cmd.txt

Typical Problems NULL s Duplicate records Multivaluedtables More difficult processing: Get latest available number Difficult in SQL and in GAMS Advantage of SQL: we can repair a number of problems on the fly.

Data manipulation Role of data manipulation in a modeling language OML No data manipulation at all Do it in your data source environment (e.g. Excel) AMPL More extensive data manipulation facilities Powerful if fits within declarative paradigm GAMS Extensive use of data manipulation Procedural

Policy Evaluation Models Often have serious data handling requirements Aggregation/disaggregation Estimation/Calibration Simulation Examples: Loc= lines of code, comments excluded Model LOC LOC for equ s Polysys 22576 < 500 Impact2000 7284 0 IntegratedIW5 2077 < 500

Sparse Data: matrix multiplication Ampl GAMS paramn := 250; set I :={..N}; parama{iin I, j in I} := if (i=j) then ; paramb{iin I, j in I} := if (i=j) then ; paramc{iin I, j in I} := sum{k in I} A[i,k]*B[k,j]; params := sum{iin I, j in I} C[i,j]; display s; end; set i/*250/; alias (i,j,k); parameter A(i,j),B(i,j),C(i,j); A(i,i) = ; B(i,i) = ; C(i,j) = sum(k, A(i,k)*B(k,j)); scalar s; s = sum((i,j),c(i,j)); display s;

Timings 00 0 gams-dense gams-sparse ampl-dense ampl-sparse glpk-dense glpk-sparse 0. 50 200 250

Solving Linear Equations Solve Ax=b for x Often not a good idea to calculate A - In GAMS we can solve by specifying Ax=b as equations

Inverse If you really want the inverse of a matrix: i.e. solve for A - A A = I

Speed Up: Advanced Basis We can provide advanced basis so the calculation takes 0 Simplex iterations Inv.m(i,j) = 0; (var: basic) Inverse.m(i,j) = ; (equ: non-basic) method= method=2 n=50 0.637 0.433 n=00 8.267 4.036 n=200 33.236 53.395

External Solver using GDX execute_unload'a.gdx',i,a; execute '=invert.exe a.gdx i a b.gdx pinva'; parameter pinva(i,j); execute_load'b.gdx',pinva; a.gdx GAMS i,a(i,j) Invert pinva(i,j) b.gdx

Test Matrix: Pei Matrix + + + α α α + + α α method= method=2 method=3 n=50 0.637 0.433 0.027 n=00 8.267 4.036 0.055 n=200 33.236 53.395 0.8

Other tools Cholesky Eigenvalue Eigenvector

Max Likelihood Estimation NLP solver can find optimal values: estimates But to get covariances we need: Hessian Invert this Hessian We can do this now in GAMS Klunky, but at least we can now do this GDX used several times

MLE Estimation Example * Data: * Number of days until the appearance of a carcinoma in 9 * rats painted with carcinogen DMBA. set i /i*i9/; table data(i,*) days censored i 43 i2 64 i3 88 i4 88 i5 90 i6 92 i7 206 i8 209 i9 23 i0 26 i 220 i2 227 i3 230 i4 234 i5 246 i6 265 i7 304 i8 26 i9 244 ; set k(i) 'not censored'; k(i)$(data(i,'censored')=0) = yes; parameter x(i); x(i) = data(i,'days'); scalars p 'number of observations' m 'number of uncensored observations' ; p = card(i); m = card(k); display p,m;

MLE Estimation *------------------------------------------------------------------------------- * estimation *------------------------------------------------------------------------------- scalar theta 'location parameter' /0/; variables sigma 'scale parameter' c 'shape parameter' loglik 'log likelihood' ; equation eloglike; c.lo = 0.00; sigma.lo = 0.00; eloglike.. loglik =e= m*log(c) - m*c*log(sigma) + (c-)*sum(k,log(x(k)-theta)) - sum(i,((x(i)-theta)/sigma)**c); model mle /eloglike/; solve mle maximizing loglik using nlp;

Get Hessian *------------------------------------------------------------------------------- * get hessian *------------------------------------------------------------------------------- option nlp=convert; $onecho > convert.opt hessian $offecho mle.optfile=; solve mle minimizing loglik using nlp; * * gams cannot add elements at runtime so we declare the necessary elements here * set dummy /e,x,x2/; parameter h(*,*,*) '-hessian'; execute_load "hessian.gdx",h; display h; set j /sigma,c/; parameter h0(j,j); h0('sigma','sigma') = h('e','x','x'); h0('c','c') = h('e','x2','x2'); h0('sigma','c') = h('e','x','x2'); h0('c','sigma') = h('e','x','x2'); display h0;

Invert Hessian *------------------------------------------------------------------------------- * invert hessian *------------------------------------------------------------------------------- execute_unload "h.gdx",j,h0; execute "=invert.exe h.gdx j h0 invh.gdx invh"; parameter invh(j,j); execute_load "invh.gdx",invh; display invh;

Normal Quantiles *------------------------------------------------------------------------------- * quantile of normal distribution *------------------------------------------------------------------------------- * find * p = 0.05 * q = probit(-p/2) scalar prob /.05/; * we don't have the inverse error function so we calculate it * using a small cns model equations e; variables probit; e.. Errorf(probit) =e= -prob/2; model inverterrorf /e/; solve inverterrorf using cns; display probit.l; * verification: *> qnorm(0.975); *[].959964 *> Or just use.96

Finally: confidence intervals *------------------------------------------------------------------------------- * calculate standard errors and confidence intervals *------------------------------------------------------------------------------- parameter result(j,*); result('c','estimate') = c.l; result('sigma','estimate') = sigma.l; result(j,'stderr') = sqrt(abs(invh(j,j))); result(j,'conf lo') = result(j,'estimate') - probit.l*result(j,'stderr'); result(j,'conf up') = result(j,'estimate') + probit.l*result(j,'stderr'); display result; ---- 68 PARAMETER result estimate stderr conf lo conf up sigma 234.39 9.646 25.43 253.224 c 6.083.068 3.989 8.77

New developments Instead of calling external programs with a GDX interface Call a user provided DLL With simplified syntax: Parameter A(i,j),B(j,i); A(i,j) = Scalar status; BridgeCall('gamslapack', 'invert', A, B, Status);

Behind the scenes Map GAMS data to fortran, c, Deal with calling conventions (stdcall) Specified in a small spec file file bridgelibrary.ini [bridge] id=gams bridge library lib=gamslapack lib2=gamsgsl subroutine invert(a,n,b,info) beginning of file gamslapack.ini [Library] Version= Description=GAMS interface to LAPack LibName=gamslapack DLLName=gamslapack Storage=F [invert] name=invert i=q // param = square matrix id=i2 i2=d // param2 = n o=q // param3 = square matrix od=i,2 od2=i, o2=n // param4 = info status=o2

Gets more exciting when We can parse subroutine headers In libraries And generate this automatically This will open up access to Numerical, statistical libraries Current tools as function (sql2gms, LS solver, ) Etc. Longer term: also for equations derivatives