Chapter 6 Load Balancing
|
|
|
- Gwendolyn Taylor
- 10 years ago
- Views:
Transcription
1 Chapter 6 Load Balancing Part I. Preliminaries Part II. Tightly Coupled Multicore Chapter 2. Parallel Loops Chapter 3. Parallel Loop Schedules Chapter 4. Parallel Reduction Chapter 5. Reduction Variables Chapter 6. Load Balancing Chapter 7. Overlapping Chapter 8. Sequential Dependencies Chapter 9. Strong Scaling Chapter 10. Weak Scaling Chapter 11. Exhaustive Search Chapter 12. Heuristic Search Chapter 13. Parallel Work Queues Part III. Loosely Coupled Cluster Part IV. GPU Acceleration Part V. Map-Reduce
2 6 2 BIG CPU, BIG DATA T he Euler totient function of a number n, denoted Φ(n), is the number of numbers in the range 1 through n 1 that are relatively prime to n. Two numbers are said to be relatively prime if they have no factors in common other than 1. The Euler totient function is the foundation upon which the security of the RSA public key cryptosystem rests. An RSA public key consists of a modulus n and an exponent e. The modulus is the product of two large prime numbers. The prime numbers are typically 300 or more digits long, so n is typically 600 or more digits long. If you could compute the Euler totient of an RSA public key modulus, you would be able to decrypt messages encrypted with that key. This would let you hack into the connection between a web browser and a secure web site, steal passwords and credit card numbers, and generally bring Internet commerce to a standstill. So far, however, no one has managed to invent an algorithm that computes the Euler totient of the product of two 300-digit prime numbers in a practical amount of time. Listing 6.1 is a sequential program to compute Φ(n). The program simply loops through every number i from 1 through n 1, determines whether i and n are relatively prime, and increments a counter if so. The program decides whether i and n are relatively prime by computing a list of the prime factors of each number and comparing the lists to see if they have any entries in common. The program factorizes a number using trial division, similar to the algorithm in the primality testing program in Chapter 2. For example, suppose I am computing Φ(n) for n = 100. The prime factors of 100 are 2, 2, 5, 5. Next I compute the prime factors of every number i from 1 to 99. For example, the prime factors of 35 are 5, 7. Comparing the lists of factors, I see that 35 and 100 have a factor in common, namely 5. Therefore 35 and 100 are not relatively prime. On the other hand, the prime factors of 39 are 3, 13; 39 and 100 have no factors in common (other than 1); so 39 and 100 are relatively prime. It turns out that the following numbers are relatively prime to 100: 1, 3, 7, 9, 11, 13, 17, 19, 21, 23, 27, 29, 31, 33, 37, 39, 41, 43, 47, 49, 51, 53, 57, 59, 61, 63, 67, 69, 71, 73, 77, 79, 81, 83, 87, 89, 91, 93, 97, and 99. There are 40 of them, so Φ(100) = 40. This is a horribly inefficient way to compute the Euler totient. I m not trying to develop the world s finest Euler totient program; I m using this example to make a point about parallel programming. Bear with me. To parallelize this program, note that there are no sequential dependencies in the loop on lines The computation for each number i can be done independently of every other number. So I can change the loop to a parallel for loop and use the parallel reduction pattern to add all the per-thread counters together. Listing 6.2 is the result. Here is what each program printed when told to compute the Euler totient of n = 10,000,019 on a four-core tardis node. Because this n is prime, every number from 1 through n 1 is relatively prime to n, so Φ(n) = n 1 in this case.
3 Chapter 6. Load Balancing package edu.rit.pj2.example; import edu.rit.pj2.task; import edu.rit.util.longlist; public class TotientSeq extends Task long n; long phi; LongList nfactors = new LongList(); LongList ifactors = new LongList(); // Main program. public void main (String[] args) throws Exception // Validate command line arguments. if (args.length!= 1) usage(); n = Long.parseLong (args[0]); // Compute totient. phi = 0; factorize (n, nfactors); for (long i = 2; i < n; ++ i) if (relativelyprime (factorize (i, ifactors), nfactors)) ++ phi; // Print totient. System.out.printf ("%d%n", phi + 1); // Store a list of the prime factors of <I>x</I> in ascending // order in the given list. private static LongList factorize (long x, LongList list) list.clear(); long p = 2; long psqr = p*p; while (psqr <= x) if (x % p == 0) list.addlast (p); x /= p; else p = p == 2? 3 : p + 2; psqr = p*p; if (x!= 1) list.addlast (x); return list; Listing 6.1. TotientSeq.java (part 1)
4 6 4 BIG CPU, BIG DATA $ java pj2 debug=makespan edu.rit.pj2.example.totientseq \ Job 9 makespan msec $ java pj2 debug=makespan edu.rit.pj2.example.totientsmp \ Job 10 makespan msec The speedup was = Unlike the previous parallel program s we ve studied, the TotientSmp program s speedup is not very close to the ideal speedup of 4. Ideally, the TotientSmp program s running time on the four-core node should have been = msec. What s going on? To find out, we have to look at how much time each parallel team thread spends executing its portion of the parallel loop. I modified the TotientSmp program to measure and print each thread s running time. Look at the result: Ideally, each thread should spend the same amount of time in the parallel loop. But we see they do not; some take more time than others, some take less. Because the parallel loop does not finish until the longest-running thread finishes, the program s running time ends up being larger and its speedup smaller than they should be. This situation, where the parallel team threads take different amounts of time to execute, is called an unbalanced load. An unbalanced load is undesirable; it causes the parallel program to take more time than necessary. Rather, we want the program to have a balanced load, where every thread takes about the same amount of time. But why is the TotientSmp program s load unbalanced? After all, the parallel for loop is using the default schedule, namely a fixed schedule, which gives each parallel team thread an equal portion of the loop iterations. However, in the TotientSmp program, the code executed in each loop iteration on lines takes a different amount of time in each loop iteration. Why? Because as the number i increases, the factorize() method takes longer and longer to compute the prime factors of i. Thus, the higher-ranked threads the ones that factorize the larger values of i take longer than the lowerranked threads. This leads to the unbalanced load. In the previous chapters parallel programs, each parallel for loop iteration took the same amount of time; so the load was inherently balanced, and using the default fixed schedule resulted in close-to-ideal speedups. How can we get a balanced load in the TotientSmp program? We have to use a different parallel for loop schedule. Instead of dividing the parallel for loop iterations into four large chunks (one chunk for each parallel team
5 Chapter 6. Load Balancing // Determine whether two numbers are relatively prime, given // their lists of factors. private static boolean relativelyprime (LongList xfactors, LongList yfactors) int xsize = xfactors.size(); int ysize = yfactors.size(); int ix = 0; int iy = 0; long x, y; while (ix < xsize && iy < ysize) x = xfactors.get (ix); y = yfactors.get (iy); if (x == y) return false; else if (x < y) ++ ix; else ++ iy; return true; // Print a usage message and exit. private static void usage() System.err.println ("Usage: java pj2 " + edu.rit.pj2.example.totientseq <n>"); throw new IllegalArgumentException(); // Specify that this task requires one core. protected static int coresrequired() return 1; Listing 6.1. TotientSeq.java (part 2) package edu.rit.pj2.example; import edu.rit.pj2.longloop; import edu.rit.pj2.task; import edu.rit.pj2.vbl.longvbl; import edu.rit.util.longlist; public class TotientSmp extends Task long n; LongVbl phi; LongList nfactors = new LongList(); // Main program. public void main (String[] args) throws Exception Listing 6.2. TotientSmp.java (part 1)
6 6 6 BIG CPU, BIG DATA thread) as the fixed schedule does, let s divide the loop iterations into many small chunks with, say, 1000 iterations in each chunk. Then let the threads execute chunks in a dynamic fashion. Each thread starts by executing one chunk. When a thread finishes its chunk, it executes the next available chunk. When all chunks have been executed, the parallel for loop finishes. This way, some threads execute fewer longer-running chunks, other threads execute more shorter-running chunks, the threads finish at roughly the same time, and the load is balanced. This is called a dynamic schedule. You can specify the parallel for loop schedule and chunk size on the pj2 command line by including the schedule and chunk parameters. These override the default fixed schedule. Here is the same TotientSmp program run on tardis, this time with a dynamic schedule and a chunk size of 1000 iterations: $ java pj2 debug=makespan schedule=dynamic chunk=1000 \ edu.rit.pj2.example.totientsmp Job 11 makespan msec This time the speedup was = 3.964, very close to an ideal speedup. The dynamic schedule has indeed balanced the load, as is evident from the parallel team threads individual running times: With a dynamic schedule, there s a tradeoff. If the chunk size is too large, the load can become unbalanced again, like a fixed schedule. The unbalanced load can result in a longer running time. However, if the chunk size is too small, there will be extra overhead in the program, as the parallel loop has to generate and feed more chunks to the parallel team threads. This extra overhead can also result in a longer running time. It s not always apparent what the best chunk size should be for a dynamic schedule. As an alternative, you can specify a proportional schedule. Instead of specifying chunks of a certain size, you specify a certain number of chunks. The number of chunks is the number of threads times a chunk factor. The set of loop iterations is partitioned into that many equal-sized chunks, and the threads execute these chunks in a dynamic fashion. As the number of threads increases, the number of chunks also increases proportionally, and the chunks become smaller. Here is the same TotientSmp run on tardis with a proportional schedule: $ java pj2 debug=makespan schedule=proportional \ edu.rit.pj2.example.totientsmp Job 12 makespan msec
7 Chapter 6. Load Balancing // Validate command line arguments. if (args.length!= 1) usage(); n = Long.parseLong (args[0]); // Compute totient. phi = new LongVbl.Sum (0); factorize (n, nfactors); parallelfor (2, n - 1).exec (new LongLoop() LongList ifactors; LongVbl thrphi; public void start() ifactors = new LongList(); thrphi = (LongVbl) threadlocal (phi); public void run (long i) if (relativelyprime (factorize (i, ifactors), nfactors)) ++ thrphi.item; ); // Print totient. System.out.printf ("%d%n", phi.item + 1); // Store a list of the prime factors of <I>x</I> in ascending // order in the given list. private static LongList factorize (long x, LongList list) list.clear(); long p = 2; long psqr = p*p; while (psqr <= x) if (x % p == 0) list.addlast (p); x /= p; else p = p == 2? 3 : p + 2; psqr = p*p; if (x!= 1) list.addlast (x); return list; // Determine whether two numbers are relatively prime, given // their lists of factors. private static boolean relativelyprime Listing 6.2. TotientSmp.java (part 2)
8 6 8 BIG CPU, BIG DATA For the above run on the four-core tardis node, the loop index range was partitioned into 40 chunks 4 threads times the default chunk factor of 10. The speedup was 3.844; better than the fixed schedule, not quite as good as the dynamic schedule. As another alternative, you can specify a guided schedule. Like a dynamic schedule, a guided schedule divides the parallel for loop iterations into many smaller chunks. However, the chunks are not all the same size. Earlier chunks have more iterations; later chunks have fewer iterations. This tends to balance the load automatically without needing to specify the chunk size. Here is the same TotientSmp program run on tardis with a guided schedule: $ java pj2 debug=makespan schedule=guided \ edu.rit.pj2.example.totientsmp Job 13 makespan msec The speedup was 4.015, even better than the dynamic schedule in fact, essentially an ideal speedup. (That it is slightly greater than 4 might be due to random measurement error.) For a guided schedule, the chunk parameter gives the minimum chunk size; if omitted, the default is 1. When a parallel program needs load balancing, you should experiment with different parallel for loop schedules, chunk sizes, and chunk factors on typical inputs to determine the schedule that yields the smallest running time. If you choose, you can then hard-code the schedule into the program; this overrides the default schedule and any schedule specified on the pj2 command line. To get a dynamic schedule with a chunk size of 1000, write: parallelfor(lb,ub).schedule(dynamic).chunk(1000)... To get a proportional schedule with a chunk factor of 100, write: parallelfor(lb,ub).schedule(proportional).chunk(100)... To get a guided schedule with the default minimum chunk size, write: parallelfor(lb,ub).schedule(guided)... Under the Hood Figure 6.1 shows how various schedules partition the iterations of a parallel for loop into chunks. The loop has N = 100 iterations and is being executed by a parallel team with K = 4 threads. A fixed schedule partitions the iterations into K chunks, each of size N K, and assigns one chunk to each team thread. If N is not evenly divisible by K, the final chunk has fewer iterations than the other chunks. A leapfrog schedule is similar, except each team thread increments the loop index by K on each iteration instead of by 1.
9 Chapter 6. Load Balancing (LongList xfactors, LongList yfactors) int xsize = xfactors.size(); int ysize = yfactors.size(); int ix = 0; int iy = 0; long x, y; while (ix < xsize && iy < ysize) x = xfactors.get (ix); y = yfactors.get (iy); if (x == y) return false; else if (x < y) ++ ix; else ++ iy; return true; // Print a usage message and exit. private static void usage() System.err.println ("Usage: java pj2 " + edu.rit.pj2.example.totientsmp <n>"); throw new IllegalArgumentException(); Listing 6.2. TotientSmp.java (part 3) Figure 6.1. Chunk sizes for 100 iterations and four threads
10 6 10 BIG CPU, BIG DATA A dynamic schedule partitions the iterations into chunks of a fixed size (5 iterations in this example). The default chunk size is 1. If N is not evenly divisible by the chunk size, the final chunk has fewer iterations than the other chunks. Each chunk is assigned to a team thread at the beginning of the loop and whenever a team thread finishes its previous chunk. A proportional schedule is similar to a dynamic schedule, except it partitions the iterations into a fixed number of chunks rather than a fixed size of chunks. The number of chunks is equal to a chunk factor times K; the default chunk factor is 10. Thus, as K increases, the number of chunks also increases proportionally. The chunk size is N (C K), where C is the chunk factor. If N is not evenly divisible by C K, the final chunk has fewer iterations than the other chunks. A guided schedule is similar to a dynamic schedule, except it determines the size of each chunk on the fly. Each chunk s size is half the number of remaining iterations divided by K. If this is less than the specified minimum chunk size (default one iteration), the chunk size is the minimum chunk size. Earlier chunks have more iterations, later chunks have fewer iterations. With N = 100 and K = 4, the guided schedule s chunk sizes are 12, 11, 9, 8, 7, 6, and so on. Points to Remember If different iterations of a parallel for loop can take different amounts of time to execute, use a dynamic, proportional, or guided schedule to balance the load. Run the parallel program on typical inputs with various schedules, chunk sizes, and chunk factors to determine the schedule that yields the smallest overall running time.
BIG CPU, BIG DATA. Solving the World s Toughest Computational Problems with Parallel Computing. Alan Kaminsky
Solving the World s Toughest Computational Problems with Parallel Computing Alan Kaminsky Solving the World s Toughest Computational Problems with Parallel Computing Alan Kaminsky Department of Computer
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6
Multiprocessor Scheduling and Scheduling in Linux Kernel 2.6 Winter Term 2008 / 2009 Jun.-Prof. Dr. André Brinkmann [email protected] Universität Paderborn PC² Agenda Multiprocessor and
Zabin Visram Room CS115 CS126 Searching. Binary Search
Zabin Visram Room CS115 CS126 Searching Binary Search Binary Search Sequential search is not efficient for large lists as it searches half the list, on average Another search algorithm Binary search Very
Object Oriented Software Design
Object Oriented Software Design Introduction to Java - II Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa October 28, 2010 G. Lipari (Scuola Superiore Sant Anna) Introduction
PARALLEL PROCESSING AND THE DATA WAREHOUSE
PARALLEL PROCESSING AND THE DATA WAREHOUSE BY W. H. Inmon One of the essences of the data warehouse environment is the accumulation of and the management of large amounts of data. Indeed, it is said that
Object Oriented Software Design
Object Oriented Software Design Introduction to Java - II Giuseppe Lipari http://retis.sssup.it/~lipari Scuola Superiore Sant Anna Pisa September 14, 2011 G. Lipari (Scuola Superiore Sant Anna) Introduction
Paillier Threshold Encryption Toolbox
Paillier Threshold Encryption Toolbox October 23, 2010 1 Introduction Following a desire for secure (encrypted) multiparty computation, the University of Texas at Dallas Data Security and Privacy Lab created
Chapter 7: Sequential Data Structures
Java by Definition Chapter 7: Sequential Data Structures Page 1 of 112 Chapter 7: Sequential Data Structures We have so far explored the basic features of the Java language, including an overview of objectoriented
Primality Testing and Factorization Methods
Primality Testing and Factorization Methods Eli Howey May 27, 2014 Abstract Since the days of Euclid and Eratosthenes, mathematicians have taken a keen interest in finding the nontrivial factors of integers,
Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel
Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:
Introducing Tetra: An Educational Parallel Programming System
Introducing Tetra: An Educational Parallel Programming System, Jerome Mueller, Shehan Rajapakse, Daniel Easterling May 25, 2015 Motivation We are in a multicore world. Several calls for more parallel programming,
Introduction to Scheduling Theory
Introduction to Scheduling Theory Arnaud Legrand Laboratoire Informatique et Distribution IMAG CNRS, France [email protected] November 8, 2004 1/ 26 Outline 1 Task graphs from outer space 2 Scheduling
Load Balancing. Load Balancing 1 / 24
Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait
Cloud Computing on Amazon's EC2
Technical Report Number CSSE10-04 1. Introduction to Amazon s EC2 Brandon K Maharrey [email protected] COMP 6330 Parallel and Distributed Computing Spring 2009 Final Project Technical Report Cloud Computing
Library (versus Language) Based Parallelism in Factoring: Experiments in MPI. Dr. Michael Alexander Dr. Sonja Sewera.
Library (versus Language) Based Parallelism in Factoring: Experiments in MPI Dr. Michael Alexander Dr. Sonja Sewera Talk 2007-10-19 Slide 1 of 20 Primes Definitions Prime: A whole number n is a prime number
J a v a Quiz (Unit 3, Test 0 Practice)
Computer Science S-111a: Intensive Introduction to Computer Science Using Java Handout #11 Your Name Teaching Fellow J a v a Quiz (Unit 3, Test 0 Practice) Multiple-choice questions are worth 2 points
Java Coding Practices for Improved Application Performance
1 Java Coding Practices for Improved Application Performance Lloyd Hagemo Senior Director Application Infrastructure Management Group Candle Corporation In the beginning, Java became the language of the
Software Tool for Implementing RSA Algorithm
Software Tool for Implementing RSA Algorithm Adriana Borodzhieva, Plamen Manoilov Rousse University Angel Kanchev, Rousse, Bulgaria Abstract: RSA is one of the most-common used algorithms for public-key
Number Theory and the RSA Public Key Cryptosystem
Number Theory and the RSA Public Key Cryptosystem Minh Van Nguyen [email protected] 05 November 2008 This tutorial uses to study elementary number theory and the RSA public key cryptosystem. A number
Process Scheduling CS 241. February 24, 2012. Copyright University of Illinois CS 241 Staff
Process Scheduling CS 241 February 24, 2012 Copyright University of Illinois CS 241 Staff 1 Announcements Mid-semester feedback survey (linked off web page) MP4 due Friday (not Tuesday) Midterm Next Tuesday,
Computer and Network Security
MIT 6.857 Computer and Networ Security Class Notes 1 File: http://theory.lcs.mit.edu/ rivest/notes/notes.pdf Revision: December 2, 2002 Computer and Networ Security MIT 6.857 Class Notes by Ronald L. Rivest
The Mathematics of the RSA Public-Key Cryptosystem
The Mathematics of the RSA Public-Key Cryptosystem Burt Kaliski RSA Laboratories ABOUT THE AUTHOR: Dr Burt Kaliski is a computer scientist whose involvement with the security industry has been through
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur
Cryptography and Network Security Department of Computer Science and Engineering Indian Institute of Technology Kharagpur Module No. # 01 Lecture No. # 05 Classic Cryptosystems (Refer Slide Time: 00:42)
The application of prime numbers to RSA encryption
The application of prime numbers to RSA encryption Prime number definition: Let us begin with the definition of a prime number p The number p, which is a member of the set of natural numbers N, is considered
Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart
Hadoop/MapReduce Object-oriented framework presentation CSCI 5448 Casey McTaggart What is Apache Hadoop? Large scale, open source software framework Yahoo! has been the largest contributor to date Dedicated
OpenACC 2.0 and the PGI Accelerator Compilers
OpenACC 2.0 and the PGI Accelerator Compilers Michael Wolfe The Portland Group [email protected] This presentation discusses the additions made to the OpenACC API in Version 2.0. I will also present
Number Theory and Cryptography using PARI/GP
Number Theory and Cryptography using Minh Van Nguyen [email protected] 25 November 2008 This article uses to study elementary number theory and the RSA public key cryptosystem. Various commands will
Example of a Java program
Example of a Java program class SomeNumbers static int square (int x) return x*x; public static void main (String[] args) int n=20; if (args.length > 0) // change default n = Integer.parseInt(args[0]);
Analysis of Binary Search algorithm and Selection Sort algorithm
Analysis of Binary Search algorithm and Selection Sort algorithm In this section we shall take up two representative problems in computer science, work out the algorithms based on the best strategy to
1 Digital Signatures. 1.1 The RSA Function: The eth Power Map on Z n. Crypto: Primitives and Protocols Lecture 6.
1 Digital Signatures A digital signature is a fundamental cryptographic primitive, technologically equivalent to a handwritten signature. In many applications, digital signatures are used as building blocks
In this Chapter you ll learn:
Now go, write it before them in a table, and note it in a book. Isaiah 30:8 To go beyond is as wrong as to fall short. Confucius Begin at the beginning, and go on till you come to the end: then stop. Lewis
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING
ANALYSIS OF RSA ALGORITHM USING GPU PROGRAMMING Sonam Mahajan 1 and Maninder Singh 2 1 Department of Computer Science Engineering, Thapar University, Patiala, India 2 Department of Computer Science Engineering,
Practice Questions. CS161 Computer Security, Fall 2008
Practice Questions CS161 Computer Security, Fall 2008 Name Email address Score % / 100 % Please do not forget to fill up your name, email in the box in the midterm exam you can skip this here. These practice
Applied Cryptography Public Key Algorithms
Applied Cryptography Public Key Algorithms Sape J. Mullender Huygens Systems Research Laboratory Universiteit Twente Enschede 1 Public Key Cryptography Independently invented by Whitfield Diffie & Martin
GPU File System Encryption Kartik Kulkarni and Eugene Linkov
GPU File System Encryption Kartik Kulkarni and Eugene Linkov 5/10/2012 SUMMARY. We implemented a file system that encrypts and decrypts files. The implementation uses the AES algorithm computed through
Load Balancing Techniques
Load Balancing Techniques 1 Lecture Outline Following Topics will be discussed Static Load Balancing Dynamic Load Balancing Mapping for load balancing Minimizing Interaction 2 1 Load Balancing Techniques
How To Write A Map In Java (Java) On A Microsoft Powerbook 2.5 (Ahem) On An Ipa (Aeso) Or Ipa 2.4 (Aseo) On Your Computer Or Your Computer
Lab 0 - Introduction to Hadoop/Eclipse/Map/Reduce CSE 490h - Winter 2007 To Do 1. Eclipse plug in introduction Dennis Quan, IBM 2. Read this hand out. 3. Get Eclipse set up on your machine. 4. Load the
Garbage Collection in the Java HotSpot Virtual Machine
http://www.devx.com Printed from http://www.devx.com/java/article/21977/1954 Garbage Collection in the Java HotSpot Virtual Machine Gain a better understanding of how garbage collection in the Java HotSpot
Moving from CS 61A Scheme to CS 61B Java
Moving from CS 61A Scheme to CS 61B Java Introduction Java is an object-oriented language. This document describes some of the differences between object-oriented programming in Scheme (which we hope you
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE
IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT
Speeding Up RSA Encryption Using GPU Parallelization
2014 Fifth International Conference on Intelligent Systems, Modelling and Simulation Speeding Up RSA Encryption Using GPU Parallelization Chu-Hsing Lin, Jung-Chun Liu, and Cheng-Chieh Li Department of
Introduction to Parallel Programming and MapReduce
Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant
Matlab on a Supercomputer
Matlab on a Supercomputer Shelley L. Knuth Research Computing April 9, 2015 Outline Description of Matlab and supercomputing Interactive Matlab jobs Non-interactive Matlab jobs Parallel Computing Slides
Dynamic load balancing of parallel cellular automata
Dynamic load balancing of parallel cellular automata Marc Mazzariol, Benoit A. Gennart, Roger D. Hersch Ecole Polytechnique Fédérale de Lausanne, EPFL * ABSTRACT We are interested in running in parallel
Getting to know Apache Hadoop
Getting to know Apache Hadoop Oana Denisa Balalau Télécom ParisTech October 13, 2015 1 / 32 Table of Contents 1 Apache Hadoop 2 The Hadoop Distributed File System(HDFS) 3 Application management in the
Determining the Optimal Combination of Trial Division and Fermat s Factorization Method
Determining the Optimal Combination of Trial Division and Fermat s Factorization Method Joseph C. Woodson Home School P. O. Box 55005 Tulsa, OK 74155 Abstract The process of finding the prime factorization
IRA EXAMPLES. This topic has two examples showing the calculation of the future value an IRA (Individual Retirement Account).
IRA EXAMPLES This topic has two examples showing the calculation of the future value an IRA (Individual Retirement Account). Definite Counting Loop Example IRA After x Years This first example illustrates
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015
CS423 Spring 2015 MP4: Dynamic Load Balancer Due April 27 th at 9:00 am 2015 1. Goals and Overview 1. In this MP you will design a Dynamic Load Balancer architecture for a Distributed System 2. You will
Hadoop WordCount Explained! IT332 Distributed Systems
Hadoop WordCount Explained! IT332 Distributed Systems Typical problem solved by MapReduce Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize,
Design Notes for an Efficient Password-Authenticated Key Exchange Implementation Using Human-Memorable Passwords
Design Notes for an Efficient Password-Authenticated Key Exchange Implementation Using Human-Memorable Passwords Author: Paul Seymer CMSC498a Contents 1 Background... 2 1.1 HTTP 1.0/1.1... 2 1.2 Password
Parallel Programming Map-Reduce. Needless to Say, We Need Machine Learning for Big Data
Case Study 2: Document Retrieval Parallel Programming Map-Reduce Machine Learning/Statistics for Big Data CSE599C1/STAT592, University of Washington Carlos Guestrin January 31 st, 2013 Carlos Guestrin
Hadoop Parallel Data Processing
MapReduce and Implementation Hadoop Parallel Data Processing Kai Shen A programming interface (two stage Map and Reduce) and system support such that: the interface is easy to program, and suitable for
Java Program Coding Standards 4002-217-9 Programming for Information Technology
Java Program Coding Standards 4002-217-9 Programming for Information Technology Coding Standards: You are expected to follow the standards listed in this document when producing code for this class. Whether
Parallel Data Preparation with the DS2 Programming Language
ABSTRACT Paper SAS329-2014 Parallel Data Preparation with the DS2 Programming Language Jason Secosky and Robert Ray, SAS Institute Inc., Cary, NC and Greg Otto, Teradata Corporation, Dayton, OH A time-consuming
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
CS 111 Classes I 1. Software Organization View to this point:
CS 111 Classes I 1 Software Organization View to this point: Data Objects and primitive types Primitive types operators (+, /,,*, %). int, float, double, char, boolean Memory location holds the data Objects
Cryptography and Network Security
Cryptography and Network Security Spring 2012 http://users.abo.fi/ipetre/crypto/ Lecture 7: Public-key cryptography and RSA Ion Petre Department of IT, Åbo Akademi University 1 Some unanswered questions
Getting Started with domc and foreach
Steve Weston [email protected] February 26, 2014 1 Introduction The domc package is a parallel backend for the foreach package. It provides a mechanism needed to execute foreach loops in parallel.
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
SECURITY IMPROVMENTS TO THE DIFFIE-HELLMAN SCHEMES
www.arpapress.com/volumes/vol8issue1/ijrras_8_1_10.pdf SECURITY IMPROVMENTS TO THE DIFFIE-HELLMAN SCHEMES Malek Jakob Kakish Amman Arab University, Department of Computer Information Systems, P.O.Box 2234,
Threads Scheduling on Linux Operating Systems
Threads Scheduling on Linux Operating Systems Igli Tafa 1, Stavri Thomollari 2, Julian Fejzaj 3 Polytechnic University of Tirana, Faculty of Information Technology 1,2 University of Tirana, Faculty of
Job Reference Guide. SLAMD Distributed Load Generation Engine. Version 1.8.2
Job Reference Guide SLAMD Distributed Load Generation Engine Version 1.8.2 June 2004 Contents 1. Introduction...3 2. The Utility Jobs...4 3. The LDAP Search Jobs...11 4. The LDAP Authentication Jobs...22
Copy the.jar file into the plugins/ subfolder of your Eclipse installation. (e.g., C:\Program Files\Eclipse\plugins)
Beijing Codelab 1 Introduction to the Hadoop Environment Spinnaker Labs, Inc. Contains materials Copyright 2007 University of Washington, licensed under the Creative Commons Attribution 3.0 License --
Big Data Frameworks: Scala and Spark Tutorial
Big Data Frameworks: Scala and Spark Tutorial 13.03.2015 Eemil Lagerspetz, Ella Peltonen Professor Sasu Tarkoma These slides: http://is.gd/bigdatascala www.cs.helsinki.fi Functional Programming Functional
Mrs: MapReduce for Scientific Computing in Python
Mrs: for Scientific Computing in Python Andrew McNabb, Jeff Lund, and Kevin Seppi Brigham Young University November 16, 2012 Large scale problems require parallel processing Communication in parallel processing
Secure File Transfer Using USB
International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 Secure File Transfer Using USB Prof. R. M. Goudar, Tushar Jagdale, Ketan Kakade, Amol Kargal, Darshan Marode
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
Scalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
Efficient and Robust Secure Aggregation of Encrypted Data in Wireless Sensor Networks
Efficient and Robust Secure Aggregation of Encrypted Data in Wireless Sensor Networks J. M. BAHI, C. GUYEUX, and A. MAKHOUL Computer Science Laboratory LIFC University of Franche-Comté Journée thématique
International Journal of Information Technology, Modeling and Computing (IJITMC) Vol.1, No.3,August 2013
FACTORING CRYPTOSYSTEM MODULI WHEN THE CO-FACTORS DIFFERENCE IS BOUNDED Omar Akchiche 1 and Omar Khadir 2 1,2 Laboratory of Mathematics, Cryptography and Mechanics, Fstm, University of Hassan II Mohammedia-Casablanca,
What is Multi Core Architecture?
What is Multi Core Architecture? When a processor has more than one core to execute all the necessary functions of a computer, it s processor is known to be a multi core architecture. In other words, a
qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq
qwertyuiopasdfghjklzxcvbnmqwerty uiopasdfghjklzxcvbnmqwertyuiopasd fghjklzxcvbnmqwertyuiopasdfghjklzx cvbnmqwertyuiopasdfghjklzxcvbnmq Introduction to Programming using Java wertyuiopasdfghjklzxcvbnmqwertyui
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 11 Block Cipher Standards (DES) (Refer Slide
Tutorial: Getting Started
9 Tutorial: Getting Started INFRASTRUCTURE A MAKEFILE PLAIN HELLO WORLD APERIODIC HELLO WORLD PERIODIC HELLO WORLD WATCH THOSE REAL-TIME PRIORITIES THEY ARE SERIOUS SUMMARY Getting started with a new platform
RSA Attacks. By Abdulaziz Alrasheed and Fatima
RSA Attacks By Abdulaziz Alrasheed and Fatima 1 Introduction Invented by Ron Rivest, Adi Shamir, and Len Adleman [1], the RSA cryptosystem was first revealed in the August 1977 issue of Scientific American.
Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış
Istanbul Şehir University Big Data Camp 14 Hadoop Map Reduce Aslan Bakirov Kevser Nur Çoğalmış Agenda Map Reduce Concepts System Overview Hadoop MR Hadoop MR Internal Job Execution Workflow Map Side Details
Leveraging Aparapi to Help Improve Financial Java Application Performance
Leveraging Aparapi to Help Improve Financial Java Application Performance Shrinivas Joshi, Software Performance Engineer Abstract Graphics Processing Unit (GPU) and Accelerated Processing Unit (APU) offload
Table of Contents. Bibliografische Informationen http://d-nb.info/996514864. digitalisiert durch
1 Introduction to Cryptography and Data Security 1 1.1 Overview of Cryptology (and This Book) 2 1.2 Symmetric Cryptography 4 1.2.1 Basics 4 1.2.2 Simple Symmetric Encryption: The Substitution Cipher...
Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 18, 1037-1048 (2002) Short Paper Locality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors PANGFENG
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
OBM / FREQUENTLY ASKED QUESTIONS (FAQs) Can you explain the concept briefly on how the software actually works? What is the recommended bandwidth?
Can you explain the concept briefly on how the software actually works? Leading Edge Provider s Online Backup Suite consists of 3 main modules: 1. The client software Online Backup Manager (OBM) 2. The
LOOPS CHAPTER CHAPTER GOALS
jfe_ch04_7.fm Page 139 Friday, May 8, 2009 2:45 PM LOOPS CHAPTER 4 CHAPTER GOALS To learn about while, for, and do loops To become familiar with common loop algorithms To understand nested loops To implement
Chapter 8: Bags and Sets
Chapter 8: Bags and Sets In the stack and the queue abstractions, the order that elements are placed into the container is important, because the order elements are removed is related to the order in which
MATH 168: FINAL PROJECT Troels Eriksen. 1 Introduction
MATH 168: FINAL PROJECT Troels Eriksen 1 Introduction In the later years cryptosystems using elliptic curves have shown up and are claimed to be just as secure as a system like RSA with much smaller key
CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis. Linda Shapiro Winter 2015
CSE373: Data Structures and Algorithms Lecture 3: Math Review; Algorithm Analysis Linda Shapiro Today Registration should be done. Homework 1 due 11:59 pm next Wednesday, January 14 Review math essential
Storage Classes CS 110B - Rule Storage Classes Page 18-1 \handouts\storclas
CS 110B - Rule Storage Classes Page 18-1 Attributes are distinctive features of a variable. Data type, int or double for example, is an attribute. Storage class is another attribute. There are four storage
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
Chapter 12: Multiprocessor Architectures. Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup
Chapter 12: Multiprocessor Architectures Lesson 01: Performance characteristics of Multiprocessor Architectures and Speedup Objective Be familiar with basic multiprocessor architectures and be able to
CUDA programming on NVIDIA GPUs
p. 1/21 on NVIDIA GPUs Mike Giles [email protected] Oxford University Mathematical Institute Oxford-Man Institute for Quantitative Finance Oxford eresearch Centre p. 2/21 Overview hardware view
Getting Started with SandStorm NoSQL Benchmark
Getting Started with SandStorm NoSQL Benchmark SandStorm is an enterprise performance testing tool for web, mobile, cloud and big data applications. It provides a framework for benchmarking NoSQL, Hadoop,
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
Hash Tables. Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited
Hash Tables Computer Science E-119 Harvard Extension School Fall 2012 David G. Sullivan, Ph.D. Data Dictionary Revisited We ve considered several data structures that allow us to store and search for data
Load balancing Static Load Balancing
Chapter 7 Load Balancing and Termination Detection Load balancing used to distribute computations fairly across processors in order to obtain the highest possible execution speed. Termination detection
