Efficiency Considerations of PERL and Python in Distributed Processing



Similar documents
EFFICIENCY CONSIDERATIONS BETWEEN COMMON WEB APPLICATIONS USING THE SOAP PROTOCOL

Efficiency of Web Based SAX XML Distributed Processing

Virtual Credit Card Processing System

Characteristics of Java (Optional) Y. Daniel Liang Supplement for Introduction to Java Programming

Chapter 1. Dr. Chris Irwin Davis Phone: (972) Office: ECSS CS-4337 Organization of Programming Languages

Language Evaluation Criteria. Evaluation Criteria: Readability. Evaluation Criteria: Writability. ICOM 4036 Programming Languages

Load Balancing on a Non-dedicated Heterogeneous Network of Workstations

Integrated Open-Source Geophysical Processing and Visualization

How To Install Linux Titan

CS 3530 Operating Systems. L02 OS Intro Part 1 Dr. Ken Hoganson

Building Applications Using Micro Focus COBOL

System Structures. Services Interface Structure

LinuxWorld Conference & Expo Server Farms and XML Web Services

USE OF PYTHON AS A SATELLITE OPERATIONS AND TESTING AUTOMATION LANGUAGE

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

1. Overview of the Java Language

1/20/2016 INTRODUCTION

Virtual Machines.

PARALLEL & CLUSTER COMPUTING CS 6260 PROFESSOR: ELISE DE DONCKER BY: LINA HUSSEIN

Functions of NOS Overview of NOS Characteristics Differences Between PC and a NOS Multiuser, Multitasking, and Multiprocessor Systems NOS Server

GenericServ, a Generic Server for Web Application Development

Chapter 1 - Web Server Management and Cluster Topology

Detailed Design Report

Programming Languages

CatDV Pro Workgroup Serve r

Network operating systems typically are used to run computers that act as servers. They provide the capabilities required for network operation.

AQA GCSE in Computer Science Computer Science Microsoft IT Academy Mapping

Software: Systems and. Application Software. Software and Hardware. Types of Software. Software can represent 75% or more of the total cost of an IS.

Chapter 3.2 C++, Java, and Scripting Languages. The major programming languages used in game development.

Kernel Types System Calls. Operating Systems. Autumn 2013 CS4023

Extending Desktop Applications to the Web

Content Distribution Management

DIABLO VALLEY COLLEGE CATALOG

What is a programming language?

The power of IBM SPSS Statistics and R together

Last Class: OS and Computer Architecture. Last Class: OS and Computer Architecture

Java the UML Way: Integrating Object-Oriented Design and Programming

Development of complex KNX Devices

RevoScaleR Speed and Scalability

Evaluation of Load/Stress tools for Web Applications testing

PIE. Internal Structure

Software Development around a Millisecond

Document management and exchange system supporting education process

Software: Systems and Application Software

VMware Virtualization and Software Development

Objectives. Chapter 2: Operating-System Structures. Operating System Services (Cont.) Operating System Services. Operating System Services (Cont.

COMP[29]041 - Software Construction 15s2. Course Goals. Course Goals. 41 Introduction 15s2]COMP[29]041 Introduction 15s2

Real time vehicle tracking and driver behaviour monitoring using cellular handset based accelerometer and GPS data

Universidad Simón Bolívar

Reusability of WSDL Services in Web Applications

Department of Computer Science

Eastern Washington University Department of Computer Science. Questionnaire for Prospective Masters in Computer Science Students

Managing a Fibre Channel Storage Area Network

Scalability and Classifications

Research and Design of Universal and Open Software Development Platform for Digital Home

Master of Science in Software Engineering (MSC)

OpenFlow Based Load Balancing

ADESSO SCIENTIFIC SOFTWARE DEVELOPMENT ENVIRONMENT

OKLAHOMA SUBJECT AREA TESTS (OSAT )

Using the Windows Cluster

Remote Data Collection and Analysis Tom Worlton Argonne National Laboratory

Java in Education. Choosing appropriate tool for creating multimedia is the first step in multimedia design

Semester Thesis Traffic Monitoring in Sensor Networks

Overlapping Data Transfer With Application Execution on Clusters

Protect Your Cisco UCS Domain with Symantec NetBackup

School of Computer Science

In: Proceedings of RECPAD th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

REVIEW PAPER ON PERFORMANCE OF RESTFUL WEB SERVICES

LONDON SCHOOL OF COMMERCE. Programme Specification for the. Cardiff Metropolitan University. BSc (Hons) in Computing

Performance Monitoring and Analysis System for MUSCLE-based Applications

Classic Grid Architecture

Chapter 14 Analyzing Network Traffic. Ed Crowley

JAVA Technologies QUARTER 1 DESKTOP APPLICATIONS - ESSENTIALS QUARTER 2 NETWORKING AND OPERATING SYSTEMS ESSENTIALS. Module 1 - Office Applications

B) Using Processor-Cache Affinity Information in Shared Memory Multiprocessor Scheduling

DB2 Connect for NT and the Microsoft Windows NT Load Balancing Service

An Easier Way for Cross-Platform Data Acquisition Application Development

Chapter 2 System Structures

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Cluster, Grid, Cloud Concepts

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

i.sight ecommerce system

Masters in Human Computer Interaction

Industrial Network Security and Connectivity. Tunneling Process Data Securely Through Firewalls. A Solution To OPC - DCOM Connectivity

Risks with web programming technologies. Steve Branigan Lucent Technologies

Object Oriented Database Management System for Decision Support System.

Building an efficient and inexpensive PACS system. OsiriX - dcm4chee - JPEG2000

An Open MPI-based Cloud Computing Service Architecture

Client/Server Computing Distributed Processing, Client/Server, and Clusters

Chapter 12 Programming Concepts and Languages

RCL: Software Prototype

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

Masters in Networks and Distributed Systems

Design and Implementation of an Intelligent Network Monitoring and Management Tool in Internet and Intranet

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

How do Users and Processes interact with the Operating System? Services for Processes. OS Structure with Services. Services for the OS Itself

Automatic Configuration and Service Discovery for Networked Smart Devices

A standards-based approach to application integration

Layering a computing infrastructure. Middleware. The new infrastructure: middleware. Spanning layer. Middleware objectives. The new infrastructure

Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber

Transcription:

Efficiency Considerations of PERL and Python in Distributed Processing Roger Eggen (presenter) Computer and Information Sciences University of North Florida Jacksonville, FL 32224 ree@unf.edu 904.620.1326 904.620.2988 (fax) Maurice Eggen Department of Computer Science Trinity University San Antonio, TX 78212 meggen@trinity.edu 210.999.7487 210.999.7477 (fax) Abstract: With the resurgence of interest in distributed processing and programming of distributed systems, researchers are exploring methods and techniques for facilitating programming of such systems. In this paper we demonstrate the ease of use and performance characteristics of socket communication for scripting languages PERL and Python. Productivity is enhanced when developing distributed applications while considering the performance tradeoffs incurred as compared to other popular languages. Keywords: PERL, Python, Java, Distributed and Parallel Processing 1 Introduction Distributed and parallel processing belong to one of the most important areas of research in our modern computing world. From the Web to large scale numerical algorithms, distributed and parallel methods dominate distributed processing applications. It is important for researchers and software engineers to have a wealth of tools available to them for programming such distributed systems. In this paper, we advocate the use of easy parallel algorithm development languages such as PERL and Python. We compare the speed and efficiency of these vehicles with Java, which is widely used for these applications. 2 Fundamentals A distributed system is a collection of independent computers interconnected by a network and capable of cooperating to achieve the solution of a problem. The computers are considered loosely coupled if they do not share memory and have individual processors. Important programming strides have been made in recent years to provide applications programming environments that make distributed computing more common and easier to implement than in the past. The message passing interface (MPI), parallel virtual machine (PVM), Common Object Request Broker Architecture (CORBA), Java s Remote Method Invocation (RMI), PERL s SOAP interface, and others have extended ordinary languages to provide methods (functions, procedures) facilitating distributed computing. PERL and Python are not often used in distributed applications, but these languages are extremely easy to program and have features that support parallel programming and Web interfaces rivaling any available language.

In this paper, we study the basis of all Web and distributed communication over a network, that of sockets. We compare the socket communication of PERL and Python to Java for ease of development and overall efficiency. 3 PERL PERL is an open source, cross-platform programming language capable of performing a wide variety of applications. The word PERL is an acronym for Practical Extraction and Report Language. Used in both the public and private sectors, PERL is supported by UNIX, Macintosh, Windows and many more operating systems. PERL comprises the best features of many languages, including C, awk, sed, sh and Basic. PERL works with HTML, XML, and other markup languages to support unicode and both procedural and object oriented paradigms. PERL is extensible. Because of its excellent text processing capabilities, PERL is perhaps the most widely used Web programming language. Applications written using the PERL language include, but are not limited to the following: Application servers Artificial intelligence algorithms Astronomy Audio Bioinformatics Compression and encryption Content management systems (for both small and large scale Web sites) Database interfaces Date/Time Processing ecommerce Email processing GUI development Generic algorithms Graphing and charting Image processing Mathematical and statistical programming Natural language processing (in English, Chinese, Japanese, and Finnish, among others) Network programming Operating-system integration with Windows, Solaris, Linux, Mac OS, etc. PERL/Apache integration Spam identification Software testing Templating systems Prototyping for fast development Text processing Web services, Web clients, and Web servers XML/HTML processing [1] The following code performs the socket communication in Perl: my $host2 = 'node2' my $port2 = 2008; my $protocol2 = getprotobyname('tcp'); socket(sock2,af_inet,sock_stream, $protocol1) my $dest_addr2 = sockaddr_in($port2,$host2); connect(sock2,$dest_addr2) print SOCK2 $x; 4. Python Python is an easy to learn, powerful programming language. Efficient highlevel data structures lead to a simple, but effective approach to object-oriented programming. Python's elegant syntax and dynamic typing together with its interpreted nature make it an ideal language for scripting and rapid application development in many areas and on most platforms.

The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python Web site. [2] The same site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation. Python is object oriented with a class structure similar to Java. In many respects, Python has the best features of Java and avoids much of the clutter, particularly in the input/output category. Python supports multiple inheritance, polymorphism, and all the other features expected of an object oriented language. The Python interpreter is easily extended with new functions and data types implemented in C or C++ (or other languages callable from C). Python is also suitable as an extension language for customizable applications. [2] The following code performs the socket communication in Python: HOST = 'node2' # The remote host PORT = 5002 # port for server s = socket.socket(socket.af_inet,socket. SOCK_STREAM) s.connect((host, PORT)) st = a.tostring() s.send(st) Again, the communication using Python is easy and direct. 5 Hardware The hardware for evaluating socket efficiency consists of a Beowulf cluster of computers all running RedHat linux v9. The machines are 0.5 ghz machines with 512 megabytes of main memory connected by gigabit fast ethernet. 6 Software The software consists of Python 2.2.2, RedHat 9 Linux core 3.2.2-4, PERL 5.8.3 installed with threading, and Java HotSpot 1.4.1_01. Only Linux machines were involved to remove discrepancies caused by different operating systems. 7 Programming a Distributed System Programming a distributed system, regardless of the programming environment used, involves several fundamentals. Typically, one of three scenarios are used: boss/worker where the boss distributes a portion of the work uniquely, worker crew where each processor does essentially the same processing on different portions of the data, often referred to as single instruction multiple data (SIMD), and a pipeline where each worker processes a portion of the data, passing that partial solution to the next worker. In this research we use the boss/worker environment as shown in Figure 1. A boss process defines the task to be accomplished and decides the method of distribution. Several worker processes do the actual work. The worker starts by listening to the socket for the arrival of work from the boss. The boss is the controlling process that has the responsibility of dividing the task appropriately. A portion of the task is sent to each of the workers. The boss waits to receive the completed tasks. Each of the workers execute in parallel. They each receive their task from the boss without interaction with other workers.

Worker Worker... Worker Figure 1 Communication Dependency 8 Evaluation Boss The purpose of this research is to determine how much a user gains (or gives up) to enjoy the ease of programming languages PERL and Python in distributed applications. We chose a simple sorting algorithm. Sorting is well understood and ubiquitous in computing, so it seemed a reasonable place to start. We programmed an n 2 sort so reasonable time would be taken on each of the server machines, creating a measure of both communication and computation times. Boss and workers are programmed using Java, PERL, and Python. Each application is implemented with exactly the same functionality. The boss machine creates sets of integers whose cardinality vary between 1,000 and 256,000. Each set was then evenly divided and distributed among the worker machines. Each worker sorted their portion of the data, yielding a highly parallel solution. The communication from boss to worker to boss was handled through native sockets. Sockets provide the foundation of all Web applications and consequently impact the performance of any Web based process. As is often the case, the boss did not participate in the primary task, but rather distributed the work and waited for results. Timings included dividing and distributing the data, sorting each data set, receiving the sorted results from each worker, and merging to arrive at a totally sorted set of data. 9 Results Datasets of sizes 1000, 2000, 4000, 8000, 16000, 32000, 64000, 128000, and 256000, were sorted and timed. Since the machines used in this experiment were not dedicated to the researchers, times were taken at 3:00AM when network traffic and users were at a minimum. These machines are lightly used, so unplanned impact on communication is virtually nonexistent. The authors have carefully checked the algorithms, each is correct and implemented in a consistent manner. The timing results are in Figure 2. We note that the efficiency of PERL is significantly less than that of Java. A variety of considerations were employed to increase PERL s efficiency, each with limited success. PERL 5.8.0 ran 90% as fast as PERL 5.8.3, so one should certainly ensure the latest version of PERL is used. Also, threading is relatively new to PERL, only coming into existence in the 5.8.x versions. Threaded PERL 5.8.3 runs 96% as fast as a non-threaded installation. In a distributed application, installation of a threaded version of PERL is essential. The PERL application can be compiled to native code using perlcc. Perlcc is in a nonstable state and should not be used for production applications, but it did generate native code. However, the native code executed only 92% as fast as the code directly interpreted. We expect

a significant increase in performance when perlcc is enhanced. The authors were surprised to see the performance comparison between Python, PERL, and Java. We expected Python to perform much better when compared to the other languages. A less defined measure is programmer productivity. We carefully monitored our development time for each of the applications. While this is somewhat arbitrary, ill defined, and rather qualitative, we believe our experience is suggestive of the nature one can expect when considering these environments. Upon evaluation of our respective experiences in each environment, we believe programmer time was most efficient in Python, Perl a close second, and development of the Java application approximately 30% longer. We realize this consideration is highly arbitrary, but believe programmer productivity is a significant factor in cost of application development. Processors 2 4 8 Data Size PERL Python Java PERL Python Java PERL Python Java 1000 0.943 0.914 0.247 0.59 0.341 0.142 0.579 0.087 0.212 2000 2.271 3.616 0.284 1.155 1.308 0.199 0.719 0.331 0.257 4000 8.09 21.174 0.367 3.357 5.261 0.23 1.276 1.312 0.268 8000 31.046 84.569 0.546 12.058 21.174 0.236 3.49 5.291 0.289 16000 122.327 338.504 1.474 46.338 84.735 0.488 12.243 21.305 0.338 32000 490.912 1353.233 4.931 178.554 339.36 1.411 46.958 85.446 0.553 64000 1983.205 5426.464 17.973 682.562 1354.352 4.837 180.644 339.825 1.468 128000 7984.755 21684.42 70 2553.147 5414.646 17.68 693.55 1362.639 4.921 256000 31653.09 87170.28 308.03 9330.001 21712.73 68.136 2568.75 5456.379 17.923 Figure 2 Efficiency Timings 10 Summary and Conclusions If the fastest application processing possible is the goal, PERL, Python, or Java would not be the choice. PVM or MPI with native routines will outperform all the above applications. However, when developer productivity, ease of programming, portability, and prototyping are desired, PERL and Python are viable alternatives. PERL and Python support a robust and convenient API for implementing distributed and Web based applications. There are many Web based applications programmed in PERL, since it is easy to implement. Java s socket communication is efficient, which accounts for some of its performance. Also, interpreting byte code is more efficient than pure interpretation. Note, each application scales appropriately with the size of the data set. 11 Future Research In JavaSpace s Linda-like environment, the efficiency in a non-reliable networking environment should be studied to determine the impact of a loaded network or failed processes. Part

of the advantages of JavaSpaces is its ability to recover from failure of processes, processors, or the network. PERL provides a SOAP-Lite Web based environment. Ease of programming and efficiency of applications development should be considered. 12 Bibliography [1] http://www.perl.org/ [2] http://www.python.org [3] Wall, L., Christiansen, T., and Orwant, J. Programming Perl. O REILLY (2000). [4] Liu, M. Distributed Computing. Addison Wesley. Boston (2004). [5] Stein, L. Network Programming with Perl. Addison Wesley. Boston (2001). [6] Christiansen, T. and Torkington, N. Perl Cookbook. O REILLY. Sebastopol, CA.(1998). [7] Schwartz, R. Speeding up Your Perl Programs. Sys Admin CMP Media LLC. November 2003, pp. 43-44.