C H A P T E R
E I G H T
High-Throughput Computing in the Sciences Mark Morgan and Andrew Grimshaw Contents 199 200 200 201 202 204 204 208 211 215 217 218 218 219 222 223 226 226
1. What is an HTC Application? 2. HTC Technologies 2.1. Scripting languages 2.2. Batch queuing systems 2.3. Portable batch system 3. High-Throughput Computing Examples 3.1. Data transformation 3.2. Parameter space studies 3.3. Monte Carlo simulations 3.4. Problem decomposition 3.5. Iterative refinement 4. Advanced Topics 4.1. Resource restrictions 4.2. Checkpointing 4.3. File staging 4.4. Grid systems 5. Summary References
Abstract While it is true that the modern computer is many orders of magnitude faster than that of yesteryear; this tremendous growth in CPU clock rates is now over. Unfortunately, however, the growth in demand for computational power has not abated; whereas researchers a decade ago could simply wait for computers to get faster, today the only solution to the growing need for more powerful computational resource lies in the exploitation of parallelism. Software parallelization falls generally into two broad categories—‘‘true parallel’’ and high-throughput computing. This chapter focuses on the latter of these two types of parallelism. With high-throughput computing, users can run many copies of their software at the same time across many different Department of Computer Science, University of Virginia, Charlottesville, Virginia, USA Methods in Enzymology, Volume 467 ISSN 0076-6879, DOI: 10.1016/S0076-6879(09)67008-7
#
2009 Elsevier Inc. All rights reserved.
197
198
Mark Morgan and Andrew Grimshaw
computers. This technique for achieving parallelism is powerful in its ability to provide high degrees of parallelism, yet simple in its conceptual implementation. This chapter covers various patterns of high-throughput computing usage and the skills and techniques necessary to take full advantage of them. By utilizing numerous examples and sample codes and scripts, we hope to provide the reader not only with a deeper understanding of the principles behind high-throughput computing, but also with a set of tools and references that will prove invaluable as she explores software parallelism with her own software applications and research.
While it is true that the modern computer is many orders of magnitude faster than that of yesteryear, this tremendous growth in CPU clock rates is now over. Unfortunately, however, the growth in demand for computational power has not abated; whereas researchers a decade ago could simply wait for computers to get faster, today the only solution to the growing need for more powerful computational resource lies in the exploitation of parallelism. Parallel computing can be broken down into two broad categories: capability parallelism and capacity parallelism. Capability computation, or what we sometimes refer to as ‘‘true parallel,’’ refers to a single large application running on many computers at the same time and with the various parallel components communicating among themselves. In contrast, capacity parallelism involves many copies of an application all running simultaneously but in isolation from the other parallel components. This type of parallelism, sometimes called high-throughput computing (or HTC), is the subject of this chapter. Unlike true parallel applications that must often communicate among all participating components, HTC relies only on an initial setup and a final communication of results. During execution, each component of an HTC application works independently without any regard to the state or progress of its sibling tasks. This means that HTC applications are both easy to create and largely agnostic of computational setup—they run equally well on a cluster of machines front-ended by a batch system like PBS (http://www. pbsgridworks.com), LSF (http://www.platform.com), Condor (http://www. cs.wisc.edu/condor; Thain et al., 2005), or SGE (http://www.sun.com/ software/sge) as they do on a compute grid or cloud. Most importantly, HTC is applicable to an incredibly large and diverse range of applications. Consider the example of a single application that can analyze a satellite picture to determine if there are any geographic features of interest (perhaps indications of oil or valuable minerals). The thorough explorer wants to examine thousands of such pictures to determine the next best location to drill or mine. However, it takes far too long to run the program 1000 times in a row. Fortunately, there is no need to do so. Instead, our researcher creates a simple BASH shell script to submit and monitor jobs in a batch system.
High-Throughput Computing in the Sciences
199
This system, in turn, runs many copies of the program at the same time, each with different pictures to analyze. What could have taken weeks before on one computer now takes mere hours on his or her company’s cluster. We start this chapter with a description of what makes an application an HTC application. Then, we examine some of the technologies that exist in support of high-throughput computation. From there, we go through a number of examples to illustrate the various ways in which HTC applications are organized and managed. We then examine a few of the more advanced topics as they relate to high-throughput applications and we finish with a summary of what we have learned.
1. What is an HTC Application? There is no strict definition of an HTC application. Computer Scientists tend to define HTC in terms of how it is different from High Performance or Parallel Computing. Wikipedia suggests that the main differences have to do with execution times and coupling. Most parallel applications are tightly coupled,1 while HTC tends to be very loosely coupled (http://en. wikipedia.org/wiki/High_Throughput_Computing). More generally, we tend to say that a true parallel application is a collection of computational components all running at the same time and cooperatively working to solve a single problem—in essence it is a single large application split among a number of computational resources. In contrast, an HTC application is really a number of identical programs each running simultaneously on a group of computers and each working on a different set of input data. Sometimes called ‘‘bag-of-tasks’’ or ‘‘parameter sweep’’ applications, HTC jobs can more formally be described in terms of sets of inputs and associated results. Consider the set of inputs X ¼ {x1, x2, . . . xn}. Any given input xi represents some arbitrary collection of files and/or command line parameters used by a sequential program which implements the function f such that it produces the result ri. Therefore, the HTC application is defined to fill in the result set R ¼ {r1, r2, . . . rn} such that ri ¼ f (xi) | 8xi 2 X. Another way of saying this is that for all inputs of interest, a program is run for each input that produces a corresponding resultant output. Thus, the HTC application is the conglomeration of these mappings. Many HTC applications begin as a single sequential program that someone writes to solve a problem or answer a question. For example, what is the lift produced by this wing configuration? How similar is this protein sequence to that of a frog’s? What does the computer generated 1
The term ‘‘tightly coupled’’ refers to the tendency of these applications to require frequent communications between the constituent parallel components.
200
Mark Morgan and Andrew Grimshaw
scene look like for this frame of this movie? Each of these programs can run as single job on a single machine. They become HTC applications when someone turns around decides to run the program 1000 times with different inputs. The questions change accordingly. What does the space of solutions for wing configurations look like? Which protein sequence is closest to my sample? What does the entire scene from the movie look like?
2. HTC Technologies A number of technologies exist that are an integral part of HTC. These include things like scripting languages that provide an invaluable tool for managing and manipulating high-throughput jobs as well as programs like batch systems and grids that specifically enable the tasks necessary to run the various instances of the job. In this section, we examine a set of these tools and describe what role they play in the HTC application.
2.1. Scripting languages Not specifically designed to support HTC, one of the simplest yet most valuable tools in its support is that of the scripting language. Because the applications in question are often sequential applications that were never designed for the use to which HTC puts them, they often lack the management capabilities necessary to perform the large tasks needed. Scripting languages provide a convenient and quick way to solve this problem by giving the user a medium in which he or she may quickly write tools to organize, launch, monitor, and collect results from the various parallel job instances. Without scripting languages, users would have to control HTC applications by hand, typing in thousands of inputs and launching each job individually from the keyboard—an unscalable and intractable solution as the problem becomes increasingly large. A number of scripting languages are commonly used in scientific computing. These include the standard UNIX shell languages like BASH, CSH, and KSH as well as more advanced languages like Python (http://www. python.org), Tcl (Ousterhout, 1994), and Perl (http://www.perl.com). Often the language chosen has more to do with user/programmer familiarity then with language features and power. However, no matter what language you choose to use, the goals are the same: to write a program in an environment where one can easily and rapidly interact with external tools or programs (often programs as simple as ls,2 cat, grep, sed, awk, etc.) 2
While HTC jobs can be launched and managed from any type of computer, they are most frequently run on UNIX machines due to the rich set of commands and scripting languages available on those platforms. For that reason, I will tend to use refer to UNIX-based tools and languages.
High-Throughput Computing in the Sciences
201
Despite the fact that any scripting language can be used, one of the UNIX shell languages is often the candidate of choice because of their ubiquity, familiarity, and that fact that many HTC systems (e.g., the batch systems) use shell scripts as a means of communicating job descriptions (we will see this in action when we talk about portable batch system (PBS) scripts.)
2.2. Batch queuing systems In order to successfully run an HTC application, a single sequential application needs to be launched or run on some number of back-end machines or resources. The inputs for these sequential jobs need to be made available to each instance of the program and once the program has finished executing, the outputs need to be collected back together. While some users in the past have accomplished all of this using custom-made solutions of varying complexity and effectiveness,3 by far the most common means of controlling HTC is by way of the batch queuing system or job management system. Batch systems, also known as queuing systems, are large pieces of software that monitor and manage large clusters of machines for the purposes of ‘‘doling’’ out those resources to jobs requesting run time on them. Systems such as PBS, LSF, SGE, and Condor work by maintaining a list of jobs that users want to run and assigning computers to those jobs as the resources become available. Further, these systems often keep track of resources used by various jobs and individual users for the purpose of accounting and billing. Batch systems generally guarantee that when a job is given a resource, that resource is devoted to the job for the duration allotted (which may be fixed or dynamic based on job execution time). The batch system is usually responsible for getting the job started on the target resource and ultimately has the ability to stop or kill the job at its discretion. Batch systems all differ in the various details, but generally work in the same way despite those details. A job description file or submission script describes the job that a user wants to run and when submitted by a submission tool tells the queuing system how to run the job and what resources are required for that execution. Jobs submitted, in this way, result in a job token or key that can then be used by other tools to refer to that specific job. This key, which is nothing more than a unique string created by the queuing system, refers to that job for the lifetime of the job in the batch system’s list. Users monitor and manipulate the jobs that the batch system is managing using other tools provided by the batch system implementation. While different batch queuing systems sometimes have different tools for submitting, monitoring, and managing jobs, a POSIX standard exists which suggests the use of qsub, qstat, and qkill as the basic tools necessary to 3
A very common home-grown solution is merely the clever use of password-less ssh and simple BASH or Perl process control.
202
Mark Morgan and Andrew Grimshaw
accomplish this task. PBS, in particular, supports this standard and given its ubiquity as a job management system we will use these tools throughout this chapter as an exemplar of a queuing system implementation. Generally speaking, batch systems support two kinds of jobs: sequential and parallel. For the parallel case, what we usually mean is a tightly coupled, true parallel job such as an MPI (Gropp et al., 1994) or OpenMP (Chandra et al., 2000) job. Sequential refers to a single job that a user wants to run exactly once on a single computer. Given these definitions, how then is it that batch systems support HTC applications? The answer lies in the fact that while batch systems support sequential jobs in only the singleton case, they nevertheless have access to a relatively large number of resources on which they can launch those jobs. Thus, by submitting a number of copies of a single job (each presumably with slightly different inputs), one can use the batch system as a mechanism for controlling and managing large numbers of independent jobs. It is in this regard that the scripting languages described previously become evident. Batch systems are merely the mechanism by which jobs are run and managed, but more often than not a script is responsible for orchestrating the application as a whole.
2.3. Portable batch system PBS is one such queuing system and is one of the more common batch systems available. A number of implementations of this software exist, ranging from free, open-source versions to supported, commercially produced ones. We use PBS throughout this chapter as an exemplar batch system because of its common use and the similarities between it and other batch systems. PBS submission scripts are nothing more than shell scripts with PBS specific instructions and restrictions that are embedded in the comments of the script. This technique of embedding additional information into the comments of a scripting language (or for that matter, any language) is a common and frequently used means of extending a language beyond its original design. In fact, the ‘‘job’’ that the user is submitting to the PBS queue is really the shell script itself. This shell script almost always calls on another program to execute (the sequential application we talked about earlier) but it is important to realize that the ‘‘program’’ that PBS runs directly is the script that was submitted to the batch system and that this script can contain arbitrarily complex and intricate code. In Fig. 8.1 an example PBS submission script describes a request to run the render-frame program for frame 1 of scene 1 of a movie. There are a couple of interesting details to note about this script. First of all, the script is a standard BASH shell script as implied by its first line. Despite this fact, a number of PBS directives follow embedded in standard BASH script comments. These directives indicate, respectively, the name of the queue
High-Throughput Computing in the Sciences
203
#!/bin/bash
#PBS –q largeQueue #PBS –o /home/jdoe/movie/stdout.txt #PBS –e /home/jdoe/movie/stderr.txt
echo $HOSTNAME cd /home/jdoe/movie render-frame scene-1-frame-1.input scene-1-frame-1.tiff
Figure 8.1 Example simple PBS submission script.
where the job should be submitted,4 the location to which standard output should be redirected as the job runs, and the location to which standard error should be directed. Next, illustrating the point that the submission script can be arbitrarily complex, we see a couple of lines of BASH script which setup the job. Finally, the binary program is run given the name of the input frame to render and the name of the output file to generate. This last line is particularly important for a couple of reasons. First of all, the fact that the exact frame to render (and the exact output to generate) is given explicitly implies that this sequential job is only one of potentially many sequential jobs that make up the HTC application needed to render an entire scene or movie. One can imagine the collection of such PBS submission scripts that would be required to render a corresponding collection of frames, thus generating a movie sequence. Further, notice that no description is given for how the frame input was generated nor how the output tiffs are to be ‘‘glued’’ together into a resultant movie. This is typical for an HTC application. Most of the time, the inputs are generated through some external mechanism (perhaps another program, perhaps by hand, sometimes even by the user’s HTC management script). Additionally, the resultant outputs are usually collected together using yet another piece of software. Finally, note that the submission script assumes that the data are available in the same place regardless of which machine the job ends up running on. This last expectation, namely, that all machines controlled by that queue share some portion of the file system (usually using NFS 4
We often refer to an instance of a PBS system (or any other batch system) as a queue but in fact these systems often have more than one virtual queue embedded.
204
Mark Morgan and Andrew Grimshaw
(Sun Microsystems, Inc, 1989), CIFS (Leach and Naik, 1997), Lustre (http://wiki.lustre.org/index.php/Main_Page), or some other network file system software), is a typical constraint of most batch systems.
3. High-Throughput Computing Examples In this section, we examine more closely a number of HTC examples that illustrate patterns of computation common to scientific applications. While these patterns are separated into categories, it is worth noting that the partitioning is largely arbitrary and significant overlap between the examples may be evident. In fact, it is generally assumed that clever data decompositioning and organization can transform one type of HTC application into another (e.g., a Monte Carlo simulation can be viewed as a parameter sweep where the random number seed is the parameter, etc.). Furthermore, because the mechanism by which the individual single jobs are submitted or monitored is similar enough between the various back-end systems (be they queues or grids), we use PBS as a single unifying medium in which to demonstrate the techniques in question. These techniques should translate equally well into whatever back-end technology is appropriate for the reader.
3.1. Data transformation The first example of an HTC pattern is what we refer to as the data transformation pattern. In this pattern, we assume that the user has a program that reads data from a set of input files and generates a transformed version of that data as a set of output files. The exact transformation is irrelevant and may constitute more of an analysis than an actual transforming of the data. What is important is that the number of input data files available determines the number of output files to be generated and therefore also the number of times that the sequential application needs to be run. The movie frame example that we gave earlier (Fig. 8.1) is a perfect example of this. For this example, assume that we have a binary called lineDetector that reads an input image file and generates a new image file resultant from performing a horizontal and vertical line detection algorithm. In other words, the resultant image file contains a new image that shows the locations of the horizontal and vertical lines detected in the original image. We further assume that our input images are all located in the directory /home/jdoe/images/input and have the names input-image-1.tiff, input-image-2.tiff, etc. Our first step in generating this HTC application is to create by hand a PBS submission script for one job. We often start this way because it provides a convenient template from which to develop the rest of the HTC application. This example PBS submission script is given in Fig. 8.2.
High-Throughput Computing in the Sciences
205
#!/bin/bash
#PBS –q largeQueue
cd /home/jdoe/images lineDetector input/input-image-1.tiff output/output-image-1.tiff
Figure 8.2 Example PBS submission script.
Notice in the example that we have chosen to write our template submission script using image 1 and that we have identified our resultant output image with the same number. This pattern is typical and reflects the user’s desire to be able to associate the resultant images with the inputs from which they are derived. At this point, we should submit the script to our PBS system to verify that we have not made any mistakes with respect to running this job. Determining the correctness of your single job submission script is not always easy. Does the queuing system (in this case PBS) accept the submission script? Once submitted, does the job actually run or does it stay queued forever? After running, does your job produce the output that you expected? Figuring out why any of these problems occur is sometimes a black art and often requires a working familiarity with the batch system in question. If the queuing system does not accept the job, often it will tell you why. Maybe the queue you specified does not exist, or perhaps the format you gave for a resource restriction is not correct. If your job gets submitted to the queue but never runs, this can sometimes be caused by specifying resource restrictions that can never be satisfied such as asking for 100 nodes from a queue that only has 50 nodes available. If your job seems to run but does not produce the output you expect, your program can sometimes be in error, but sometimes some aspect of the programs environment has not been setup correctly (e.g., missing libraries, library paths not set correctly, input files not made available, etc.). To solve these problems often you will need to add statements to the submission script which indicate to the queuing system that you would like to get back standard output and standard error streams from your job (these streams will usually contain error messages indicating what went wrong). Finally, keeping in mind that the submission script that you use to submit the job to the queue is itself a shell script that will be run on the target node, you can sometimes put appropriate debugging statements into the script itself to help you determine what is happening. Once we are sure that the submission template script is correct, we need to write a submission manager or control script that creates and submits a single job to the PBS system for each individual sequential run that we need
206
Mark Morgan and Andrew Grimshaw
to perform. Because we have a directory full of input files over which we need to run the application, we will write our script to iterate through those files and submit a new job to the PBS system for each one. First, we modify our submission script template so that it contains ‘‘variables’’ in place of the names of the input and output files. These variables are nothing more than standard operating system environment variables, just like PATH and HOME, and are set by the batch queuing system from values given to the qsub program on the command line when you submit the job. The modified submission script is given in Fig. 8.3. Notice that in this example two variables, INPUT and OUPUT, are defined to indicate, respectively, the input and output file names to use for this job. Also notice that we have used the ${VARIABLE} notation for environment variables rather than the more common $VARIABLE syntax. This deviation from the typical is not required by the queuing system but rather is the author’s preference to enhance readability of the script. In all respects, the variables are true environment variables and may be referenced using any valid environment variable syntax. Strictly speaking, it is not necessary to write PBS submission templates like the one above. Instead, one can generate relatively easily the submission scripts on the fly directly from the text of the control script we are about to write using echo statements or BASH here-documents. However, because the submission script is also a BASH shell script, generating it inside of another BASH shell script is prone to errors having to do with variable substitution. Thus, for readability and clarity, we prefer to generate submission templates such as this one throughout this chapter. The final step in creating our HTC application is to create a shell script capable of iterating through the input files and, for each one, generating and submitting a PBS submission script to run the job. As an optimization, notice that the shell script first checks to see if the output file already exists. This optimization is recommended because often times an HTC run will need to be repeated to fill in missing results. These missing results can be the product of imperfect batch systems or grid systems prone to failures or job #!/bin/bash
#PBS –q largeQueue
cd /home/jdoe/images lineDetector input/${INPUT} output/${OUTPUT}
Figure 8.3 Example simple PBS template submission script.
207
High-Throughput Computing in the Sciences
loss, or simply the result of a desire to run the data translation program over additional data not available at the time the first run was initiated. Figure 8.4 shows this BASH shell script. Several features about the above shell script warrant explanation. First of all, for space and readability reasons, throughout the script we have ignored
#!/bin/bash
# We make a directory to keep the submission scripts in just to # keep our working directory from getting cluttered. mkdir -p scripts
# Iterate over all the files in the input directory. for INPUTPATH in input/* do # For each file, determine it's name is (without the path) # as well as the name of the desired output file. INPUTFILE=`basename $INPUTPATH` OUTPUTFILE=`echo $INPUTFILE | sed -e "s/input/output/g"`
# If the output file does not exist, create and submit # A PBS job. if [ ! -e output/$OUTPUTFILE ] then echo "Submitting job for input/$INPUTFILE” qsub –v “INPUT=$INPUTFILE, OUTPUT=$OUTPUTFILE” submission-script.pbs fi done
Figure 8.4
Line detector submission control script.
\
208
Mark Morgan and Andrew Grimshaw
the possibility that file names exist with spaces in them. One can of course prevent such occurrences by design, but it is generally better to create your scripts from the outset so that they are capable of accommodating such anomalies. Finally, the script could have been simplified by choosing a simpler naming scheme for our input and output file names; specifically, had the input and output file names been the same rather than different, the script would have needed only to indicate the directory for outputs rather than translating the input file names into output names. However, once again, this would tend to reduce readability, which we prefer to preserve, given that the more complex version is more typical of real world examples.
3.2. Parameter space studies Another common pattern that we often come across in HTC applications is that of the parameter space study. A parameter space study is an HTC application where we wish to run a sequential application once each for a number of points within the input space, thus generating a matching output function. Users then analyze the output space to produce some summary of results, or to pick some optimal solution, or to insert as inputs into a different application. Consider the example where we have a sequential program that determines the lift generated by a fixed wing in air for wings of various lengths and angles of attack. We want to determine what affect changing those two parameters has on the wing’s lift so that we can pick an optimal solution. This example is similar to the data translation example in that it requires us to write a shell script to submit jobs to the queue. However, for this example the number of sequential jobs to be run is determined not by a set of input files, but rather by a set of input parameter values. We start the same way, by generating the example template PBS submission script.5 Notice that this new submission script has three variables (Fig. 8.5). We have two parameters over which we wish to iterate the parameter space, #!/bin/bash
#PBS –q largeQueue
calculateLift ${WINGLENGTH} ${WINGANGLE} ${OUTPUTFILE}
Figure 8.5 Airflow over wing submission script. 5
In this example, we skip straight to the PBS submission script with variable tokens rather than giving an exemplar script with actual values to test. In the general case, however, you should always generate an exemplar for testing purposes as it helps to debug problems that may develop further down the road.
High-Throughput Computing in the Sciences
209
namely WINGLENGTH and WINGANGLE. We also need to indicate the name of the output file that we want to generate, thus requiring the third variable, OUTPUTFILE. Once again, we will want to be able to match the output files with the input data. In this case, we will do so by naming the output file so that the name tells us what length and angle were used. Now that we have the submission template ready, we generate a shell script that can manage and submit the jobs to the PBS queue. The following BASH shell script shows a script that can submit a number of jobs corresponding to an input range of wing angles and lengths. The wing lengths are given as integers representing the number of inches in the wing’s length while the angle is given as a floating point number representing the angle in degrees. This latter decision to represent the angle as a floating point number was done to illustrate a technique for iterating over floating point numbers in a BASH shell script despite the fact that the BASH scripting language cannot usually handle floating point numbers. However, generally speaking, if you are designing your own scripts, you can create your sequential program and script in such a way as to avoid this necessity. Also, once again, note that the script first checks for the desired output file before submitting the job. This, as before, allows us to repeatedly run the example generating only the output files that are missing (Fig. 8.6). #!/bin/bash
# Check to make sure that the arguments are correct if [ $# -ne 6 ] then echo "USAGE:
$0
length> " exit 1 fi
# Set variables for easier readability MINANGLE=$1; MAXANGLE=$2; ANGLEINCR=$3 MINLEN=$4; MAXLEN=$5; LENINCR=$6
Figure 8.6 (Continued)
210
Mark Morgan and Andrew Grimshaw
# Loop through the angles requested.
We have to use the
# bc program here to do the loop because BASH cannot handle # floating point numbers natively. while [ `echo "scale=1; $MINANGLE <= $MAXANGLE" | bc -l` -ne 0 ] do # Inside the angle loop, we are going to loop through the # wing lengths as well.
We assume that length is given as an
# integral number of inches LENGTH=$MINLEN while [ $LENGTH -le $MAXLEN ] do # We create a file name that reflects the angle/length OUTPUT=winglift-$MINANGLE-$LENGTH.dat if [ ! –e $OUTPUT ] then echo "Submitting job for $OUTPUT" qsub –v “WINGANGLE=$MINANGLE, WINGLENGTH=$LENGTH, \ OUTPUTFILE=$OUTPUT” submission-script.pbs fi
LENGTH=$(( $LENGTH + $LENINCR )) done
MINANGLE=`echo "scale=1; $MINANGLE + $ANGLEINCR" | bc -l` done
Figure 8.6 Airflow over wing control script.
A common alternative approach to this solution is to have parameter space studies in which the parameters themselves come from files rather than being inputted as actual numbers that you iterate through within the
High-Throughput Computing in the Sciences
211
control script. Figure 8.7 gives the same control script in Fig. 8.6 modified to take the wing angles from an input file. In practice, this file could contain any textual data, not just numbers.
3.3. Monte Carlo simulations A Monte Carlo simulation is a program that generates results based on a large number of random samples. Monte Carlos generally produce nondeterministic results and are most useful when a large number of degrees of freedom exist in the space being sampled. Monte Carlo applications are classic examples of both true parallel and HTC applications, differing from one another only in the application used to produce the results and the length of time necessary to run the simulations. In this example, we will try to estimate the value of P using a Monte Carlo simulation that works by using the knowledge that the area of a circle is equal to P multiplied by the radius of the circle squared. If you had the exact area of the circle and its exact radius you could calculate the value of P simply by dividing the area by the square of the radius. In our simulation, we will estimate the area of a circle with a radius of one by throwing imaginary darts at a dartboard with that circle inscribed inside of it. Because we know that the exact area of a square with a unit circle inscribed inside of it, we can estimate the area of the circle—and thus the value of P—by multiplying the area of the square by the ratio of darts that randomly land inside the circle to the total number thrown. However, in order to get a reasonable approximation for P, we have to throw a large number of darts. This is where our Monte Carlo simulation comes in. Assume that we have a sequential binary which, given a random6 number seed, throws 1,000,000 imaginary darts at the dartboard described earlier (it does this by generating random x and y coordinates in the range [ 1.0, 1.0]). The program then prints out the number of darts that ‘‘hit’’ within the unit circle. To turn this application into an HTC application, we need to generate a large number of PBS jobs that each runs their own 1,000,000 dart simulation. Once all of the results come back, we can then sum the results together to get our circle-to-square area ratio and thus estimate a value for P. The PBS submission template and BASH control script are given in Fig. 8.8. Notice that the name of the output file is once again related to the specific run. In this case, the output file name has a number indicating which ‘‘millions’’ of dart-throws the output represents (i.e., the first million darts, the second million, etc.). This is because there really is no 6
Random number generation is a complex topic both in sequential applications and in parallel applications. Those details, however, are beyond the scope of this chapter and as such are left to more thorough treatments available in other texts.
212
Mark Morgan and Andrew Grimshaw
#!/bin/bash
# Check to make sure that the arguments are correct if [ $# -ne 4 ] then echo "USAGE:
$0
" exit 1 fi
# Set variables for easier readability ANGLEFILE=$1 MINLEN=$2; MAXLEN=$3; LENINCR=$4
# Loop through the angles requested. for ANGLE in `cat $ANGLEFILE` do # Inside the angle loop, we are going to loop through the # wing lengths as well.
We assume that length is given as
# an integral number of inches LENGTH=$MINLEN while [ $LENGTH -le $MAXLEN ] do # We create a file name that reflects angle/length OUTPUT=winglift-$ANGLE-$LENGTH.dat if [ ! –e $OUTPUT ] then
Figure 8.7 (Continued)
High-Throughput Computing in the Sciences
213
# We create a file name that reflects angle/length OUTPUT=winglift-$ANGLE-$LENGTH.dat if [ ! –e $OUTPUT ] then echo "Submitting job for $OUTPUT" qsub –v “WINGANGLE=$ANGLE, WINGLENGTH=$LENGTH, \ OUTPUTFILE=$OUTPUT” submission-script.pbs fi
LENGTH=$(( $LENGTH + $LENINCR )) done done
Figure 8.7 Airflow over wing control script redux.
#!/bin/bash
#PBS –q largeQueue
throwDarts ${SEED} > dart-results.${NUMBER}
Figure 8.8 Monte Carlo submission script.
distinguishing characteristic between the various results (though we could if we wanted store the seed number given). Rather, we are merely using the file name as a convenient means of determining whether or not the output was generated for a given sequential run. Also, in this example, the throwDarts program does not generate an output file. Instead, it prints results to the standard output stream. This is not an uncommon occurrence and while it can be selectively used or avoided when the user has control over the source code for the sequential binary, oftentimes the binary is a piece of legacy code which cannot, for various reasons, be modified (Fig. 8.9).
214
Mark Morgan and Andrew Grimshaw
#!/bin/bash
# Check the arguments if [ $# -ne 1 ] then echo "USAGE:
$0 "
exit 1 fi
NUMITERS=$1
# Loop through the iterations while [ $NUMITERS -gt 0 ] do # Use BASH's built in RANDOM variable to generate a seed SEED=$RANDOM
# If the result hasn't yet been generated, submit a job # to create it. RESULTFILE=dart-results.$NUMITERS if [ ! -e $RESULTFILE ] then qsub –v “SEED=$SEED, NUMBER=$NUMITERS” \ submission-script.pbs fi NUMITERS=$(( $NUMITERS - 1 )) done
Figure 8.9 Monte Carlo control script.
High-Throughput Computing in the Sciences
215
3.4. Problem decomposition So far, in this chapter, we have ignored the issue of problem decomposition. Sometimes, the decomposition is either obvious or determined by the sequential program that we are using. Often, however, the user can choose how he or she wants to decompose the larger problem into a collection of smaller ones. Doing so can have a dramatic impact on the total time it takes to execute a high-throughput application as well as the overall probability that application will finish successfully. In the advanced section of this chapter, we examine the latter of these concerns, but for now we are going to see how problem decomposition can affect the overall runtime of an HTC application. Consider the dart-throwing Monte Carlo example we looked at in the previous section. A naı¨ve implementation might have written the throwDarts sequential program such that it threw exactly one dart instead of 1,000,000. In this way, the output files would have been numbered by dart rather than by millions of darts. However, this scheme lacks scalability and efficiency. To get a reasonably good estimate for P, we have to throw lots of darts. Assume that we needed to throw as many as 1 billion darts. If we generated a PBS job for each dart we would have submitted 1 billion jobs into the PBS queue and it would in turn have created 1 billion result files. From a scale point of view, neither PBS nor the file system into which the results will land is capable of handling that many items. From an efficiency perspective, running a sequential job for each dart-throw would take too long. While it might take a long time to throw 1 billion darts, it takes an incredibly short amount of time to throw one dart. On the other hand, submitting a job to PBS, having PBS run that job and then getting the results back is a relatively hefty operation requiring multiple seconds at best to complete. We simply cannot afford to spend seconds setting up and submitting jobs that only need a few microseconds to run. Instead, we batched together a large number of dart-throws into a single run of the throwDarts program so that the time that it takes to generate, submit, and run the job through PBS is amortized by the relatively long execution time of the throwDarts program. As a general rule of thumb, you want the execution time of a single unit of your HTC application to be on the order of 100 times (or more) greater than the amount of time it takes PBS to process the job. In this way, the cost of using PBS has little impact on the overall performance of your application as a whole. At the same time, you want the number of individual submissions to the PBS queue to be large enough to get some benefit. Simply submitting a single throwDarts program to the queue that throws all 1 billion darts alone produces no benefit over simply running the program on your desktop.7 7
Unless of course there is some benefit to running on a machine that the PBS batch system has access to that you do not. It is not uncommon for a user to submit a single job to a batch or grid system when that user simply cannot run the program on any other machines which he or she has access too. However, as this chapter is about HTC applications, we ignore such possibilities.
216
Mark Morgan and Andrew Grimshaw
The choice of how many jobs to break the HTC application into depends on a number of factors including how many resources the queue has access to, how many users are trying to submit jobs to the queue at any given time, and how many slots on the queue a single user is allowed to consume at any one time.8 As a general rule of thumb, if your sequential application runs in a relatively consistent amount of time regardless of the input data, then coordinating the total number of job submissions with the number of slots available to you makes sense (e.g., if you have 10 slots available to you, having somewhere around 10 jobs might make sense, assuming that the runtime of each job is reasonable). However, if the runtime of your sequential program is highly variable depending on input, or the resources on which you are running have a high chance of failure, it makes more sense to decompose the problem into many more pieces. As jobs run through the queue, longer running jobs will consume a slot for a corresponding large period of time while short running jobs finish early and vacate their respective slots, leaving them available for other jobs to run. This is an optimization mechanism known as load balancing, whereby longer running jobs have a decreased effect on the overall runtime length of the batch because many smaller runtime jobs have an opportunity to execute one after another at the same time. This principle is similar to how having multiple lanes of traffic improves the overall efficiency of cars moving along the road as opposed to having a single line of traffic whose speed is ultimately determined by the slowest driver. With the darts program, the method of decomposition, if not the number, was obvious; throwing many darts is identical in concept to throwing a single dart. However, this is not always the case. Sometimes, a sequential program does not naturally decompose into obvious pieces. The example first mentioned in this chapter (in which a scene from a movie was rendered using a batch of sequential jobs, each of which rendered a single frame from the movie) might not in fact be the best decomposition of the problem. If the time required to render a single frame is relatively small, then we would want to render many frames in a single job. Similarly, if the time required to render a single frame took a large amount of time, we would probably want to decompose the problems such that only portions of a frame were rendered by any given job. In both cases, the sequential program (and in fact the output files generated by the hypothetical decomposition) are not necessarily available. After all, the program generates pictures representing an entire frame, not pieces of a picture, or snippits of 8
To prevent one user’s jobs from preventing another user’s jobs from running, batch system administrators will often limit how many slots or nodes a user can simultaneously hold at any given time. Furthermore, the administrator will sometimes also additionally limit how many jobs a user can have in the queue at any given time, regardless of how many are running.
High-Throughput Computing in the Sciences
217
a movie. To make these decompositions work, some amount of programming on the user’s part is required. For the multiframe example, the answer is to submit jobs that are themselves scripts, each script executing the render program multiple times and generating multiple images. These images can then be tarred or zipped by the script and returned as the single result representing the snippit of the movie rendered. In the case of rendering pieces of a frame, the answer is not as simple. Unless the render program had the option to render a piece of a frame and generate a partial image file, the user would have to come up with a way of modifying the render program to perform these partial operations. He or she would also need a way of representing partial frames and later gluing them together.
3.5. Iterative refinement The last example that we will look at in this chapter with respect to typical HTC use cases is that of iterative refinement. So far all of the examples we have examined assume that we want to run a large number of sequential jobs to generate a large number of resultant outputs. These outputs would then, presumably, be combined together at the end to produce a single result. However, sometimes the parameter space is too large and the nature of the result space too unknown for the researcher to provide a sufficient set of boundary conditions for his application. Maybe he or she wants to see what wing angle to the nearest 10th of a degree and nearest 16th of an inch provides the best lift, but it takes too long to run all possible combinations. Using iterative refinement the researcher first submits a portion of an HTC job examining a relatively broad spectrum of the target parameter space. As the results from those jobs arrive, he or she can analyze them to determine which ranges of the broad parameter study show promise and can thus narrow the space and submit new jobs as a refinement of his or her parameter-sweep study. The first study might analyze wing angles by increments of 5 and lengths in increments of 6 in. Based off of the results of that study, new parameter spaces defined in terms of single degree and single inch increments are then launched for areas of interest to the researcher. Iterative refinement need not involve refinement of the parameter space. Sometimes, the refinement takes place instead in the sequential program or algorithm. A researcher comparing a protein sequence against a database of other sequences might first run an HTC application using a ‘‘quick and dirty’’ algorithm to determine which database sequences show promise. Then, based on these results, interesting database sequences could be compared against the test sample using a much slower but more accurate comparison algorithm. Regardless of the reason for the iterative refinement, the methods for controlling them are largely the same and, generally speaking, consist of building on what we have already seen in this chapter. Most of the differences lie in how to analyze the results of one run to generate the inputs for
218
Mark Morgan and Andrew Grimshaw
the next run. Usually, a user will submit the first run using a control script like the one we have described earlier, wait for that HTC run to fully complete, analyze the results, select parameters for the next run, and then submit the new run using either the same or a different control script. However, sometimes it makes more sense to have a single control script analyzing the results of one run as they are returned from the batch system and then on the fly deciding whether or not to generate and submit a new run based off of those results. Doing so, however, is a much more complicated task involving error checking with the queue (to make sure that the job was not lost or failed), making sure that the full results are available (i.e., that the job is completely finished and not simply in the process of generating results), and (potentially) simultaneously managing runs from multiple different refinements.
4. Advanced Topics So far, we have covered the relatively straightforward aspects of using systems to submit and control HTC jobs. In this next section, we will take a deeper look at some of the more advanced topics having to do with HTC applications, including restricting resources for scheduling purposes, checkpointing results, and staging data to and from local file systems. This is by no means an exhaustive list but rather an introduction to a few topics of interest and importance.
4.1. Resource restrictions It used to be the case that organizations would setup a number of queues on batch systems, each one representing a certain job type for which a particular set of resources was intended. For example, an IT department might create one queue for long-running jobs and another one for short jobs; one queue might be intended for Linux machines and the other for Solaris. Increasingly, however, it is becoming more common for an IT department to have only one queue and to rely instead on the user submitting jobs to the queue with certain restrictions indicated. For example, a user can indicate in a PBS batch script how many processors he or she wants per node, how much memory, how long the job will take to run, or even what kind of operating system is preferred. Generally speaking, in order to get the most out of your batch system, you need to describe the appropriate amount of information for your IT department’s resources. The following PBS submission script shows an example in which the user has requested that his or her job be put on a machine with two cpus per node, that it will take 10 GB of memory when executing, and that it needs to be a Linux machine of some type (Fig. 8.10).
High-Throughput Computing in the Sciences
219
#!/bin/bash
#PBS –q largeQueue #PBS –l ncpus=2:mem=10GB:arch=linux #PBS –o /home/jdoe/movie/stdout.txt #PBS –e /home/jdoe/movie/stderr.txt
echo $HOSTNAME cd /home/jdoe/movie render-frame scene-1-frame-1.input scene-1-frame-1.tiff
Figure 8.10 Example resource restrictions submission script.
4.2. Checkpointing Probably, the most important advanced topic—and one that is frequently overlooked when it comes to HTC—is that of checkpointing. While it would be wonderful if all jobs took 15 min to run to completion, the truth is that there are many applications that run for days, weeks, or even months. Unfortunately, it is unrealistic to assume that an application can run uninterrupted for long periods of time. Perhaps, the program leaks memory, or perhaps the user is sharing the machine with another program that is leaking some operating system resource, thus making the machine unstable. Labs sometimes lose power for long periods of time, causing machines to fail while jobs are running. For that matter, it is often the case that batch system itself is configured to kill jobs that take too long to execute.9 In the end, regardless of the cause, the end result is the same: the loss of all in-memory data and progress made on your long-running job. When you start talking about HTC applications, the odds of a longrunning program failing to complete increase. By utilizing lots of machines at the same time, you inadvertently increasing the chances that one of the machines on which your job is running is going to fail before it finishes. There are essentially two ways to deal with the problem of long-running jobs. One is simply to shorten your job so that it does not take as long but instead requires more runs to complete. For example, maybe each instance 9
Configuring a batch queuing system to limit jobs to a certain duration is often a bone of contention between users and administrators but is generally necessary to ensure fairness amongst all of the cluster’s users.
220
Mark Morgan and Andrew Grimshaw
of your program rendered 10 frames from a movie scene. Instead of running 1000 jobs, each rendering 10 frames of the movie, you could perhaps submit 10,000 jobs where each job rendered only one frame. The other solution is to employ something called checkpointing. Checkpointing is the act of periodically recording data about the progress of your program so that if the program should fail for whatever reason, you can simply restart the program from the last known checkpoint and continue from there. Unfortunately, checkpointing is an activity that many researchers ignore because it requires them to implement extra code that would not otherwise be necessary in a perfect world where nothing failed. Also, while a few projects have tried to make the process of checkpointing easier or automatic for applications, the truth is that none of these is perfect and the likelihood is that you will not have access to such a system. Further, it is not generally possible to describe a solution that works for all applications. Each application is different and the nature of your application’s checkpointing needs depends on how your program is structured. Furthermore, if you do not have access to the original program’s source code, you may not have the ability to checkpoint at all. Checkpointing in HTC applications often involves storing intermediate state information about your running application into a shared directory (recall that most batch systems use a shared file system to ease the transfer of applications and data between resources behind the batch system). Your management script then needs to be able to detect when an application has failed and restart that program using the stored checkpoint. Imagine an application with the command line given in Fig. 8.11. Each run of this application takes an input file as a parameter describing the data to be analyzed. It also takes two additional parameters describing, respectively, the name of an output file to generate when the program is complete and the name of a checkpoint file to generate periodically as intermediate results become available.10 Finally, the application takes an optional set of parameters instruct the program to restart from an intermediate checkpoint file already available from a previous run. Given this application, we now revisit a job management script that we saw earlier in this chapter and modify it to work with our new application. analyze-data \ [--restart ]
Figure 8.11 Example checkpointing command-line.
10
Implicit, in this example, is the assumption that the application binary removes checkpoint files as new checkpoints become available or as the program finishes successfully. If this is not the case, the control script needs to differentiate between checkpoint files that are still in use and those that are no longer needed.
High-Throughput Computing in the Sciences
221
If you compare this script with the first job management script given in this chapter, you can see that they are very similar to one another (Fig. 8.12). The only difference is that this script checks for the existence of a checkpoint file before submitting the job to the batch system. If the checkpoint file exists, then we use a different PBS submission template file (one that presumably uses the—restart version of the command). In this way, whenever we run the script, we will submit jobs to the queue, one for each input file that does not yet have a corresponding output file, and one for which the restart option will be given if an appropriately named checkpoint file exists. As with the previous case, it is important to understand the difference
#!/bin/bash
# We make a directory to keep the submission scripts in just to # keep our working directory from getting cluttered. mkdir -p scripts
# Iterate over all the files in the input directory. for INPUTPATH in input/* do # For each file, determine it's name is (without the path) # as well as the name of the desired output file and a # checkpoint file. INPUTFILE=`basename $INPUTPATH` OUTPUTFILE=`echo $INPUTFILE | sed -e "s/input/output/g"` CPFILE=`echo $INPUTFILE | sed –e “s/input/checkpoint/g”`
# If the output file does not exist, create and submit # A PBS job. if [ ! -e output/$OUTPUTFILE ] then
Figure 8.12 (Continued)
222
Mark Morgan and Andrew Grimshaw
# Before submitting a job, we first check to see if there # is an intermediate checkpoint to restart from if [ -e checkpoints/$CPFILE ] then echo “Re-submitting job for input/$INPUTFILE” TEMPLATE=resubmission-template.pbs else echo "Submitting job for input/$INPUTFILE” TEMPLATE=submission-template.pbs fi
qsub –v “INPUT=$INPUTFILE, CHECKPOINT=$CPFILE, \ OUTPUT=$OUTPUTFILE” $TEMPLATE fi done
Figure 8.12 Checkpointing example control script.
between a job that failed before for some transient problem, and one that fails consistently because of bad inputs or bad data. If your program crashes every time, it tries to work with the data file given, no amount of checkpointing and restarting the program will fix that issue.
4.3. File staging File staging is another advanced topic that is sometimes useful (and in fact, is sometimes required) for HTC applications. File staging is the act of copying a file in from a source to the compute node where the computation is taking place, or equivalently, copying some data file out to a target location from a compute node. There are many different ways to copy this data, including downloading it from the Web, copying it using ftp/sftp or rcp/scp, or even mailing a result file to an email address. In some cases, you may have no choice but to copy the data for an HTC job. Despite the fact that most batch systems (PBS included) tend to rely on shared file systems being available, sometimes the data that you need is not available on those file systems and sometimes it might be too large. Maybe
High-Throughput Computing in the Sciences
223
you have 1000 inputs files of 100 MB each and only 1 GB of disk space available (i.e., you have room to store a couple of input files at a time, but not enough to store all of them on the shared file systems). Performance is the other main reason why people will sometimes stage files in and out. When staging a file for performance, what you are essentially doing is paying an upfront cost for copying the file in from a slow storage system (such as NFS) to a faster one (such as the local disk) so that you can use the faster storage medium for repeated reads later. Sometimes, these repeated reads happen during the lifespan of a single program (e.g., the program may need to read a given file over and over again during its execution rather than read it once and store the information internally in memory). Other times, the file is reused multiple times as many different instances of a program are run for a given HTC application. Recall that it is not generally the case that if you have 1000 or 10,000 jobs to run that you will automatically have access to an equivalent number of resources. Generally, a batch system will run a few of your programs at a time and queue the rest until a resource becomes available. In this case, if you have a file that does not change between runs (what is often called a constant file), that file can be copied to local disk once and then repeatedly reused as other copies of the program are run. The following example illustrates a PBS submission script for a movie CG-rendering program that takes not only an input frame to render and the output image to which to render it, but also a texture input indicating a database of scene textures to use for the frame. Since this texture database can be reused for other frames that may later get rendered on this node, we try to copy it to local disk space once and reuse the local copy from there on out.11 File staging as it relates to creating local copies for performance reasons requires that you be aware of how the local disk space is cleaned up and when (Fig. 8.13). If you are sharing the local disk space with other users and the compute setup does not somehow automatically clean up local disk space, then computing etiquette would suggest that you have a way of cleaning up the local copies when your HTC run is complete. Conversely, if the nodes in the cluster have a mechanism in place for automatically cleaning up local disk space (e.g., every time they reboot), that event must also be anticipated.
4.4. Grid systems For the most part, information given in this chapter is independent (except in the specific syntax) of the system providing the cluster management. Whether you are talking about PBS, SGE, Condor, or a grid such as Globus (http://www.globus.org/toolkit/) or Genesis II (http://www.cs. 11
Note that the use of /local-disk and /shared-disk are used only as exemplars in this example. Every organization has its own setup for their respective compute clusters and as such each user will need to determine the geography of his or her compute environment.
224
Mark Morgan and Andrew Grimshaw
#!/bin/bash
#PBS –q largeQueue
if [ ! –e /local-disk/jdoe/scene-textures.dat ] then cp /shared-disk/jdoe/scene-textures.dat \ /local-disk/jdoe/scene-textures.dat fi render-frame scene-1-frame-1.dat \ --textures /local-disk/jdoe/scene-textures.dat \ scene-1-frame-1.tiff
Figure 8.13 Submission script for file staging example.
virginia.edu/vcgr/wiki/index.php/The_Genesis_II_Project; Morgan, 2007), generally speaking, there will be a way to execute jobs using a qsub-like mechanism, there will be a way of querying information about running or completed jobs, and there will be a way of killing or cleaning up jobs. However, there are a few differences between traditional batch systems and grids that are worth pointing out. Grid systems, like batch queuing systems, give users the ability to start, monitor, and manage jobs on remote back-end resources. They differ from batch systems in the flexibility that they offer users both in terms of types and numbers of resources, as well as in the availability of tools for job management and control. Batch systems usually restrict users to clusters of similarly configured machines (generally, though not always, of the same operating system and make). They also typically back-end to resources under a single administrative domain, inevitably limiting the number of resources available for use. Grids, on the other hand, are designed to support greatly varying resource types from numerous administrative domains. It is not at all uncommon for a grid system to include resources from multiple universities, companies, or national labs, ranging in type from large supercomputers or workstations running variations of UNIX to small desktop computers running Mac OS X or Windows to clusters of computers sitting in racks in a machine room somewhere. In fact, a grid system will often contain among its resources other batch queuing systems.
High-Throughput Computing in the Sciences
225
While many batch systems can front-end for heterogeneous compute nodes (i.e., compute nodes of differing architectures and operating systems), this is not generally put to use in most organizations. Usually, a given queue will submit jobs to only one type of compute node (sometimes identical in every regard, sometimes differing in insignificant ways such as hard drive size or clock speed). Grids, however, by their very nature tend to be quite diverse, supporting large numbers and types of resources ranging from Windows to Linux, desktop to rack-mount, and fast to slow. Sometimes, the machines in grids will have policies in place to prohibit execution when someone is logged in to the machine, and sometimes they will not. This diversity means that when you submit a job to a grid, you will often need to specify the resource constraints applicable to your job, such as what operating system it needs and how much memory it requires. Given that grids support heterogeneous sets of machines, these machines are highly unlikely to support a shared file system (which you will recall was an outright assumption for most batch systems). Some grids do support shared namespaces and file systems through the use of grid-specific device drivers such as Genesis II’s FUSE file system for Linux or its G-ICING Installable File System for Windows, but this is by no means guaranteed. Given this restriction, HTC applications running on a grid will often have no choice but to stage data in and out. Another difference between grids and batch systems is that grids often support machines in wildly differing administrative domains and situations. When an HTC job is running on a cluster of machines in a controlled environment such as a power-conditioned machine room, a user could be reasonably confident that his application could run for hours or even a day or more without interruption. However, when you start including machines in public computer labs at a University, or even those sitting in student’s dorm rooms, the chances of the machine getting powered off or rebooting skyrockets. For this reason, when working with grids you will often need to be even more vigilant about picking appropriate job lengths and checkpointing. Finally, in a grid system the chances of your application being installed on any given machine, or installed with the correct plug-ins, modules, or libraries that you need become vanishingly small. For this reason, grids often include some sort of mechanism for application configuration and deployment. While it may seem that using a grid instead of a compute cluster only complicates an already complex problem, it is important to realize that the benefits of grids can often outweigh these drawbacks. Grids are usually many orders of magnitude larger then clusters in terms of numbers of resources. They tend to be undersubscribed in terms of usage as compared to compute clusters that are frequently oversubscribed. Also, they provide many other features and functions that clusters simply cannot, such as data
226
Mark Morgan and Andrew Grimshaw
sharing and collaboration, fault-tolerance, Quality of Service (QoS) guarantees, etc. For many people, these benefits make the added complications worthwhile.
5. Summary In this chapter, we have provided a brief introduction to HTC techniques as they relate to the sciences. We have tried to describe some of the more common patterns in the hopes that the examples are both illustrative and potentially useful to users. However, no single example can ever be a one-size-fits-all solution. Every application has its own nuances and requirements and each solution will by necessity tend to be unique to that application. We have shown that a good working knowledge of scripting can be invaluable to an HTC user and that familiarity with basic tools such as grep, sed, and awk tremendously enhances the ways in which a user can manage and control his or her job. Finally, we have tried to provide enough of an introduction to more advanced HTC compute topics such as staging and checkpointing to give the reader an idea of other areas of computation that he or she can explore if they seem relevant or important to his or her application space. HTC has been and remains one of the more effective means of parallelization available to the researcher. Having a good understanding of these techniques and mechanisms will aid you as you produce not only future applications but also the data that you will one day analyze with those applications. While it is an unfortunate fact of life that you sometimes must work with existing software over which you have little or no control, a working understanding of HTC techniques will help you plan for and simplify HTC control and submission scripts with a little bit of upfront planning.
REFERENCES Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D., and McDonald, J. (2000). Parallel Programming in OpenMP. Morgan Kaufmann, 1558606718. Gropp, W., Lusk, E., and Skjellum, A. (1994). Using MPI: Portable parallel programming with the message-passing interface. Sci. Eng. Comput. Ser. MIT Press, Cambridge, MA pp. 0-262-57104-8307. Ousterhout, J. K. (1994). Tcl and the Tk Toolkit. Addison-Wesley, Reading, MA0-20163337-X. Leach, P. J., and Naik, D. C. (1997). A Common Internet File System (CIFS/1.0) Protocol. http://tools.ietf.org/html/draft-leach-cifs-v1-spec-01.txt. 19 December.
High-Throughput Computing in the Sciences
227
Morgan, M. M. (2007). Genesis II: Motivation, Architecture, and Experiences using Emerging Web and OGF Standards. Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid, 0-7695-2833-3. May 2007. Sun Microsystems, Inc (1989). NFS: Network Filesystem Protocol Specification. IETF RFC-1094. March. Thain, D., Tannenbaum, T., and Livny, M. (2005). Distributed Computing in Practice: The Condor Experience. Concurrency Comput. Pract. Exp. 17(2–4), 323–356. 10.1002/ cpe.938.