Components for Analytic Prototyping and Production Deployment

Similar documents
Analytic Modeling in Python

IDL. Get the answers you need from your data. IDL

Data Mining mit der JMSL Numerical Library for Java Applications

Numerical Analysis in the Financial Industry

The power of IBM SPSS Statistics and R together

WESTMORELAND COUNTY PUBLIC SCHOOLS Integrated Instructional Pacing Guide and Checklist Computer Math

CUSTOMER Presentation of SAP Predictive Analytics

CE 504 Computational Hydrology Computational Environments and Tools Fritz R. Fiedler

Software Development Kit

MEng, BSc Computer Science with Artificial Intelligence

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

Technical White Paper The Excel Reporting Solution for Java

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

JMulTi/JStatCom - A Data Analysis Toolkit for End-users and Developers

Chapter 13: Program Development and Programming Languages

Computational Mathematics with Python

MEng, BSc Applied Computer Science

Grow Revenues and Reduce Risk with Powerful Analytics Software

Programming Languages & Tools

Thomas Jefferson High School for Science and Technology Program of Studies Foundations of Computer Science. Unit of Study / Textbook Correlation

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Sisense. Product Highlights.

CA Aion Business Rules Expert r11

CACHÉ: FLEXIBLE, HIGH-PERFORMANCE PERSISTENCE FOR JAVA APPLICATIONS

HEP data analysis using jhepwork and Java

SQL Server 2005 Reporting Services (SSRS)

Computational Mathematics with Python

Business Portal for Microsoft Dynamics GP. Key Performance Indicators Release 10.0

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

Numerical Algorithms Group. Embedded Analytics. A cure for the common code. Results Matter. Trust NAG.

Integrating TAU With Eclipse: A Performance Analysis System in an Integrated Development Environment

João Diogo Almeida Premier Field Engineer Microsoft Corporation

Analysis Programs DPDAK and DAWN

The Prophecy-Prototype of Prediction modeling tool

Scientific Programming, Analysis, and Visualization with Python. Mteor 227 Fall 2015

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Version Overview. Business value

imc FAMOS 6.3 visualization signal analysis data processing test reporting Comprehensive data analysis and documentation imc productive testing

Data Analysis with MATLAB The MathWorks, Inc. 1

Numerical Analysis. Professor Donna Calhoun. Fall 2013 Math 465/565. Office : MG241A Office Hours : Wednesday 10:00-12:00 and 1:00-3:00

New Features for Sybase Mobile SDK and Runtime. Sybase Unwired Platform 2.1 ESD #2

Scriptless Test Automation. Next generation technique for improvement in software testing. Version 1.0 February, 2011 WHITE PAPER

Tableau Metadata Model

Math Content by Strand 1

Advanced analytics at your hands

Maximierung des Geschäftserfolgs durch SAP Predictive Analytics. Andreas Forster, May 2014

LittleCMS: A free color management engine in 100K.

03 The full syllabus. 03 The full syllabus continued. For more information visit PAPER C03 FUNDAMENTALS OF BUSINESS MATHEMATICS

OpenText Information Hub (ihub) 3.1 and 3.1.1

Computational Mathematics with Python

Visionet IT Modernization Empowering Change

SAP BusinessObjects Design Studio Deep Dive. Ian Mayor and David Stocker SAP Session 0112

Main Bullet #1 Main Bullet #2 Main Bullet #3

SAP S/4HANA Embedded Analytics

Chapter 13: Program Development and Programming Languages

CorHousing. CorHousing provides performance indicator, risk and project management templates for the UK Social Housing sector including:

Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access

Bachelor of Games and Virtual Worlds (Programming) Subject and Course Summaries

DATA MASKING A WHITE PAPER BY K2VIEW. ABSTRACT K2VIEW DATA MASKING

KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

SmartCard Prototype. Neal Altman, Michael K. Martin, Dawn Robertson and Kathleen M. Carley September 2009 CMU-ISR

CA Gener/OL r7.1. Overview. Business value

Data Mining is the process of knowledge discovery involving finding

Data Analytics at NERSC. Joaquin Correa NERSC Data and Analytics Services

The Julia Language Seminar Talk. Francisco Vidal Meca

WHITE PAPER. Peter Drucker. intentsoft.com 2014, Intentional Software Corporation

ADVANCED SCHOOL OF SYSTEMS AND DATA STUDIES (ASSDAS) PROGRAM: CTech in Computer Science

KITES TECHNOLOGY COURSE MODULE (C, C++, DS)

KnowledgeSEEKER POWERFUL SEGMENTATION, STRATEGY DESIGN AND VISUALIZATION SOFTWARE

SharePoint Impact Analysis. AgilePoint BPMS v5.0 SP2

Integrating SharePoint Sites within WebSphere Portal

Scientific Programming in Python

CA Explore Performance Management for z/vm

Diploma Of Computing

DRIVING COMPETITIVE ADVANTAGE BY PREDICTING THE FUTURE

A Comparison of Programming Languages for Graphical User Interface Programming

ElegantJ BI. White Paper. Considering the Alternatives Business Intelligence Solutions vs. Spreadsheets

UNIVERSITY OF WATERLOO Software Engineering. Analysis of Different High-Level Interface Options for the Automation Messaging Tool

Visualization Tools for Comprehensive Test Ban Treaty Research r n h F q

OpenText Actuate Big Data Analytics 5.2

The best way to get Microsoft Visual Studio 2005 is by purchasing or renewing an MSDN Subscription today.

Module 2 - Multiplication Table - Part 1-1

Programmabilty. Programmability in Microsoft Dynamics AX Microsoft Dynamics AX White Paper

Multiprocess System for Virtual Instruments in Python

ANALYTICS CENTER LEARNING PROGRAM

Embedded SQL. Unit 5.1. Dr Gordon Russell, Napier University

MicroStrategy Products

Generating ABI PRISM 7700 Standard Curve Plots in a Spreadsheet Program

Microsoft Project 2010 builds on the Microsoft Project 2007 foundation with flexible work management solutions and the right collaboration tools for

Making confident decisions with the full spectrum of analysis capabilities

ANSA and μeta as a CAE Software Development Platform

About Dell Statistica

Vendor: Crystal Decisions Product: Crystal Reports and Crystal Enterprise

Improve Fortran Code Quality with Static Analysis

Python for Series 60 Platform

Axiomatic design of software systems

An Esri White Paper October 2010 Developing with Esri Business Analyst Server

Transcription:

Components for Analytic Prototyping and Production Deployment PyIMSL Studio Features A White Paper by Visual Numerics August 2009 www.vni.com

Components for Analytic Prototyping and Production Deployment PyIMSL Studio Features by Visual Numerics, a Rogue Wave Software Company 2009 by Visual Numerics, Inc. All Rights Reserved Printed in the United States of America Publishing History: February 2009 Initial publication August 2009 Update Trademark Information Visual Numerics, IMSL and PV-WAVE are registered trademarks. JMSL, TS-WAVE, JWAVE, and PyIMSL are trademarks of Visual Numerics, Inc., in the U.S. and other countries. All other product and company names are trademarks or registered trademarks of their respective owners. The information contained in this document is subject to change without notice. Visual Numerics, Inc. makes no warranty of any kind with regard to this material, included, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Visual Numerics, Inc, shall not be liable for errors contained herein or for incidental, consequential, or other indirect damages in connection with the furnishing, performance, or use of this material.

TABLE OF CONTENTS Abstract... 4 Prototyping versus Production... 4 Why Prototype in a Dynamic Language... 5 Why Python... 5 PyIMSL Studio Components... 6 Simple Installation... 6 Python Language... 6 NumPy... 7 Industry Standard Analytics... 7 Data Tools... 9 Other Python Components... 10 Documentation... 11 Support... 12 Summary... 12

Abstract This paper explores the advantages of using the Python dynamic language and the PyIMSL Studio development environment for analytic model development and how this can benefit the productivity of both those involved in the prototype modeling and the production aspects of deploying analytic applications. PyIMSL Studio is the first and only commercially-available numerical analysis application development environment designed for deploying mathematical and statistical prototype models into production applications. This paper is intended for users familiar with creating production applications that leverage analytical models, math or statistics, and who typically deploy in Python code or C code using the IMSL C Numerical Library. Prototyping versus Production Many organizations today developing analytic applications follow a process that includes creating a prototype model before developing the production application. Many development and numerical analysis tools are available for the prototype stage, where the goal is to explore and refine analytic techniques using sample data, develop requirements for production, and validate the viability of the solution and achieve sign-off to proceed with production development. Production deployment often involves additional development to package the prototype code into an application or embeddable component that handles data from production sources, has the performance desired for production, properly handles errors, and is fully tested. As an open language supported on many platforms, Python can be used for production deployment. There are features and components available that allow standalone applications or web-based applications in Python to be developed and deployed. However, many prototype models eventually become a part of production applications that are written in development languages like C, C++, C#/.NET, Java or Fortran. To transform a model into a component of a production application in these languages, the modeler often must rewrite the prototype in a development language or hand off the prototype to an implementation team to do the work. In many cases, the re-write includes using a separate native library for the analytics in the production application. Several testing steps must happen to ensure that the numeric results from the prototype match the numeric results in the production application. This paper shows how Python and PyIMSL Studio are a good choice for such prototype modeling work, and will show how PyIMSL Studio can also be leveraged for production deployment, in Python or other languages. Page 4

Why Prototype in a Dynamic Language Dynamic languages are well suited for prototype work for a number of reasons: Dynamic languages offer rapid code development, often with a 2-5X reduction in lines of code to produce identical functionality of that in languages like C/C++, Java and C#. There is less syntactic decoration in dynamic languages. Variables are often loosely typed and do not require formal type declarations or function templates. Dynamic languages are higher level and do not involve pointer manipulation and referencing, and operators often have more power in the types of data objects they can perform operations on. In the end the code can be much more readable all at a cost, of course, which usually results in some computing overhead and in moving subtle programming problems from being caught in compilation to appearing at runtime. Dynamic languages can usually be run from an interactive command prompt allowing ad-hoc use as a calculator or allow interactive debugging of code. Dynamic languages do not require the edit, compile, test cycle of development. Code can be changed and immediately executed or entered interactively. All of these features allow for rapid development; easy to read and maintain code; and a shorter learning curve by the domain experts usually involved in prototype modeling who are typically not professional developers. Why Python Python is a leading open source dynamic language well suited for analytic prototype modeling for a number of reasons: It is a well rounded language, which can be used for either procedural or object-oriented development. Other dynamic languages are often more special purpose, with features that address certain kinds of problems but are not balanced for general programming. It is an open language and not a proprietary language, allowing for greater sharing of tools and analytic code across a wider audience of users. There are a large number of open-source toolkits for analytical modeling with Python. This is the result of more than a decade of strong adoption and contributions by the scientific community. It is a loosely typed language with simple syntax that makes it easy to read and understand. The industry-standard NumPy package for Python transforms it into a language for array based operations suitable for efficient storage and manipulation or large multidimensional arrays. NumPy includes a simple syntax to index, subset and perform operations on arrays, and is efficient in memory use and performance. While there are a number of open source analytic libraries and tools available for Python, the PyIMSL wrappers (part of the PyIMSL Studio environment) offer the most comprehensive collection of rich analytic and statistical techniques for both prototype modeling and production deployment, regardless of the final deployed hardware or operating system. Page 5

Python Weaknesses There are a number of general challenges in adopting open source languages and tools, and Python is no exception. These include: 1. Installation of the many components available for Python can be tricky, and may involve compiling code (especially on non-windows platforms). 2. Ability of different open source components to interact is compounded by frequent releases, making it sometimes difficult to maintain a stable environment. 3. Documentation of open source components varies widely, and can be a challenge. Often online resources or postings describe features or problems with out-of-date versions of components. 4. A lack of dedicated support when using these tools in mission critical applications is a risk when using Python and the many open source tools available for it. PyIMSL Studio Components PyIMSL Studio combines the Python language and a selection of robust Python tools with the advanced analytics from the IMSL C Numerical Library. It addresses the Python weaknesses described above by providing a tested, documented, fully supported and easy to install Python environment. Gaps in data I/O and cleansing are filled with additional functionality from Visual Numerics. The components in PyIMSL Studio provide the functionality needed for prototype modelers as well as analytic functionality in C libraries needed to deploy into production environments. Components and services in PyIMSL Studio include: Simple Installation A single installation program allows you to install Python and all PyIMSL Studio components. They are precompiled and fully tested on each supported platform. Python Language The Python language is integrated as part of the PyIMSL Studio. A simple example of code to calculate prime numbers shows what the language looks like: Page 6

NumPy The included NumPy package provides data objects and a set of modules for powerful and efficient data array manipulation. It is the de-facto standard for array and matrix algebra in Python. This example shows multiplying every element in an array by a scalar with one operation: Industry Standard Analytics The PyIMSL package within PyIMSL Studio provides wrappers to the IMSL C Library. These wrappers provide a simple and flexible interface to the underlying C functionality and handle all translation of Python data types and structures into the correct representation used in C. The API for these functions mirrors the C language API making it easy to translate Python code to C code for production use. Page 7

Here is a Python code fragment that trains a neural network and creates a forecast: Here is the code to accomplish the same task in C using the IMSL C Library: Functional areas in the IMSL Libraries include these high level areas and represent more than 450 available routines: Mathematics Matrix Operations Linear Algebra Eigensystems Interpolation & Approximation Quadrature Differential Equations Nonlinear Equations Optimization Special Functions Finance & Bond Calculations Genetic Algorithms Statistics Basic Statistics Time Series & Forecasting Nonparametric Tests Correlation & Covariance Data Mining Regression Analysis of Variance Transforms Goodness of Fit Distribution Functions Random Number Generation Neural Networks Page 8

Data Tools Tools to import data from ASCII files, Excel spreadsheets, ODBC sources and tools to filter and cleanse data are included. Tools developed by Visual Numerics are available both in Python and C. These data tools include: asciiread A PyIMSL Studio routine to read ASCII data files oriented in rows or columns. It has many keyword options to make it the most flexible tool of its kind in Python. It is available for both Python and C. Here is a Python example to read 2 columns from a file: impute A PyIMSL Studio routine to locate missing values, and optionally replace them with estimated values using one of 6 available algorithms, available for both Python and C. This example shows how to replace missing values with the geometric mean of its nearest four neighbors: PyODBC A Python module for database access using the Microsoft ODBC interface. This example shows how to read a column from a database table: xlrd A Python module for reading data from Microsoft Excel files. This example shows operations to inquire about sheets and reads the data from a spreadsheet: Page 9

Other Python Components matplotlib/pylab Python analytical charting components. matplotlib is an extensive plotting module for Python that can create publication quality graphs. Some of the impressive features of this library include alpha transparency layers, anti-aliased graphics, the ability to integrate into Wx and Tkinter GUIs, and the ability to create a variety of image formats including Postscript, SVGs and PDFs. matplotlib has an objectoriented interface and a simplified procedural interface (pylab). Built-in interactivity includes the availability to zoom, pan and export graphics. Here is a sample line plot created with pylab: Tkinter/WxPython two popular toolkits for creating Python user interfaces. Below is an example of a demo application provided with PyIMSL Studio that was built using WxPython widgets: Page 10

IPython/Eclipse a powerful command line interface and a full featured Integrated Development Environment (IDE) for Python. Sometimes an interactive shell environment is desired, and the IPython interface enhances the functionality available from a basic Python shell. Eclipse with the Python pydev plug-in provides a more formal development environment with features like step through debugging, command completion, syntax highlighting and many other useful features. Documentation The PyIMSL Studio User Guide provides an introduction to Python, included components and how to use them together. It has in-depth tutorials on using Python for prototype analytic development with real world problems. It offers an easy way to quickly be productive with prototyping analytic code in Python and serves as a quick reference and source for example code. Page 11

Additionally, complete API documentation is provided for the PyIMSL wrappers to the IMSL C Numerical Library, including complete API descriptions, background mathematics, references and example code. Support World class tech support for both the Visual Numerics components and the bundled Python language and open source components is provided through phone support, email and online forums. Summary This paper described how Python and PyIMSL Studio address the needs of prototype modeling in a stable, tested, supported and documented environment. Deployment of code into production environments is enhanced through the use of PyIMSL Studio which removes the gap introduced when different analytics are used in prototype and production work. With PyIMSL Studio, prototype models become part of production applications quicker and with less re-work, cost, risk and complexity. For more information or to request an evaluation copy, visit the PyIMSL Studio 1 area of the Visual Numerics website. 1 http://www.vni.com/products/imsl/pyimslstudio.php Page 12