Parallel Computing 33 (2007) 250–263 www.elsevier.com/locate/parco
Parameter optimization in 3D reconstruction on a large scale grid J.R. Bilbao-Castro a
a,b
, A. Merino a, I. Garcı´a b, J.M. Carazo a, J.J. Ferna´ndez
b,*
Biocomputing Unit, Centro Nacional de Biotecnologı´a, Universidad Auto´noma, 28049 Madrid, Spain Dept. Arquitectura de Computadores y Electro´nica, Universidad de Almerı´a, 04120 Almerı´a, Spain
b
Available online 20 February 2007
Abstract Reaching as high structural resolution as possible in 3D electron microscopy of biological specimens is crucial to understanding their function and interactions. Technical and biological limitations make electron microscopy projections of such specimens quite noisy. Under those circumstances, the broadly used Weighted Back-Projection algorithm presents some limitations for 3D reconstruction. Iterative tomographic reconstruction algorithms are well suited to provide high resolution 3D structures under such noisy conditions. Nevertheless, these iterative algorithms present two major challenges: computational expensiveness and some free parameters which need to be correctly tuned to obtain the best possible resolution. This work applies global optimization techniques to search for the optimal set of parameters and makes use of the highthroughput capabilities of grid computing to perform the required computations. Fault tolerance techniques have been included in our application to deal with the dynamic nature and complexity of large scale computational grids. The approach for parameter optimization presented here has been successfully evaluated in the European EGEE grid, obtaining good levels of speedup, throughput and transfer rates. 2007 Elsevier B.V. All rights reserved. Keywords: Grid computing; Iterative reconstruction algorithms; 3D reconstruction; Global optimization; Parameter optimization; EGEE
1. Introduction Knowledge of the three-dimensional (3D) structure of biological specimens is critical to understanding their function at all scales [1]. Electron microscopy (EM) allows the investigation of the structure of biological specimens over a wide range of sizes and resolutions [2]. The 3D structure can be derived from a set of projection images taken from the specimen at different orientations with an electron microscope. The aim of 3D reconstruction algorithms is then to obtain a 3D representation of the specimen by gathering the two-dimensional information present on a number of its projection images [2,3]. There exist two main families of 3D reconstruction algorithms: Fourier-based and iterative. The so-called Weighted Back-Projection (WBP) algorithm belongs to the former and is currently the standard in the 3D *
Corresponding author. Tel.: +34 9500 157 11; fax: +34 9500 154 86. E-mail address:
[email protected] (J.J. Ferna´ndez).
0167-8191/$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.parco.2007.02.002
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
251
electron microscopy (3DEM) field due to its low computational cost [4]. However, it presents important disadvantages, specially when projections do not cover reasonably well all the projection space and/or if signalto-noise ratio (SNR) is very low. Under those circumstances, the so-called iterative, algebraic reconstruction techniques arise as an attractive alternative [5]. They do not only perform better than WBP when the projection space is not well covered but also when SNR conditions are extremely bad [6]. As an important drawback is the fact that those iterative algorithms are computationally very expensive. Also, they require the correct selection of some parameters in order to obtain the best possible results. The process to optimize these parameters involves a tremendous computational burden. Although parallel computing strategies have already been developed to overcome the computational issues [7,8], parameter optimization has not been addressed yet. On the hand, the application of grid computing [9–13] has been analyzed and considered suitable to both, the reconstruction itself and parameter optimization [14,15]. Conceptually, iterative 3D reconstruction methods proceed by progressively refining a 3D model as iterations evolve (Fig. 1). In every iteration, (1) projections are computed from the current model; (2) the error between the experimental projections and the computed ones is calculated; (3) the model is refined so that the error is minimized through ‘‘error backprojection’’. The relaxation parameter k adjusts the refinement at each step. Analytically expressed in a simplified manner, an iterative reconstruction method can be described via its iterative step from the kth estimate to the (k + 1)th estimate: PJ I X y i v¼1 li;v xkv k xkþ1 ¼ x þ k li;j ð1Þ P j j J 2 i¼1 v¼1 li;v where xj denotes a volume element, yi is an experimental projection pixel, and li,v is the intersection of the ith ray with the vth volume element. The k parameter controls the convergence rate of the algorithm. Its optimal value will depend on the characteristics of each reconstruction (number of projections, SNR, etc.). If k value is too low, the algorithm will converge too slowly and will need a lot of iterations to reach the final result. On the other hand, too high values may make the algorithm diverge, and so, the optimal result will never be reached. Therefore, correctly selecting this parameter is of paramount importance to obtain optimal high resolution reconstructions [5,6]. In 3D reconstruction, the basis functions used to represent the object to be reconstructed greatly influences the result of the algorithm [16]. Typical volume representation relies on cubic volume elements called voxels. Voxels present a constant value within their domain and zero outside it. Alternatively, spherically symmetric volume elements (called blobs) [16–18] have turned out to be the best suitable basis functions for representing natural structures and specially suited for working under the extreme noisy conditions found in EM [6,19]. Blobs are smooth functions with a maximum value at its center which gradually decays to zero at its extreme limits, following a Gaussian-like distribution (Fig. 2):
Fig. 1. Conceptual scheme of iterative 3D reconstruction algorithms.
252
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
Fig. 2. Blobs. Right: profile of a blob: continuous function with smooth transition to zero. The radius a of the blob and the a parameter determines the global shape of the blob. Center: Arrangement of overlapping blobs on a simple cubic grid. Left: For comparison purposes, arrangement of voxels on the same grid.
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 ! rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi I m a 1 ar r 2ffi m bðm; a; a; rÞ ¼ 1 a I m ðaÞ
ð2Þ
where r denotes the radial distance from the blob center; a is the radius of the blob; a determines the global shape of the blob (the higher the value, the narrower and more spiky the blob, and vice versa); and finally Im is the modified Bessel function of order m, with m typically 2. The most important parameters defining the blob basis function are a and a. Therefore, the relaxation parameter k of the algorithm as well as the blob parameters a and a should be optimized in order to achieve a 3D reconstruction at as high resolution as possible. Finding this set of optimal parameters is a global optimization problem (GOP). So far, the selection of parameters was done based on previous experiences and try and error tests (e.g. [20]). An schematic approach to find the optimal parameters could be as follows: • • • • •
Select a number of sets of parameters to test. Each set of parameters would represent a 3D reconstruction. Each 3D reconstruction associated to the different sets of parameters is computed. The quality of the 3D reconstructions is objectively measured. Based on the quality measures, new sets of parameters are selected, starting again the procedure. This procedure is repeated until user-defined stop conditions are reached.
Doing this by hand is a tedious, cumbersome and very long process. The approach presented in this paper automates the optimization procedure, thus reducing this time and avoiding the need for an operator to drive the process. For achieving this, a multi-parameter global optimization algorithm and grid computing technologies have been put together. The optimization algorithm will select appropriate points to test and grid computing will provide the computational resources needed to perform any CPU-time consuming task involved. There have been previous works that combine parallel computing and global optimization techniques [21–23]. Recently grid computing has also started being used [24,25]. Our approach represents the first work to put into practice the combination of grid computing and global optimization to address an important bioinformatics problem. 2. Assessing the quality of 3D reconstructions – FSC To perform the optimization process it is necessary to objectively evaluate the quality of the 3D reconstructions. Comparing quality values between 3D reconstructions allows us to decide which set of parameters is
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
253
performing better. The function used to evaluate the quality of the reconstructions is usually called ‘‘objective function’’ in the field of global optimization. When the original specimen’s volume is known, quality estimation is a straight-forward task which consists of measuring differences between the reconstructed volume and the original one by means of so-called figuresof-merit (FOMs) [19,20]. Nevertheless, in experimental environments the original specimen’s structure remains unknown and there is nothing to compare the reconstruction to. In this case, a different alternative must be used. Fourier shell correlation (FSC) allows us to estimate the resolution of a reconstruction without the need of a model to compare to [26,27]. Here, the set of projections to be used for the reconstruction is split into two halves. Each half should have an equal number of randomly selected projections to ensure that all the projection space is covered by each half. Then, a reconstruction is performed using each half of projections. Both reconstructions, say f and g, are then used to compute the FSC: P n2R F n Gn FSCðf^ Þ ¼ P P ð n2R jF n j2 n2R jGn j2 Þ1=2
ð3Þ
where f^ denotes a frequency in the Fourier space; Fn and Gn represent the components of the Fourier transforms of the volumes f and g; Gn is the complex conjugate of Gn; and Rðf^ Þ is the shell corresponding to frequency f^ . Further information on the FSC can be found in [26,27]. Therefore, FSC measures the correlation between both reconstructions for the different frequency shells. When, for a certain shell, the correlation value falls below a selected threshold (usually, and for this work, 0.5), that frequency is considered to be the highest one for which the result is reliable and thus, a estimated maximum resolution value can be determined [3]. This means that finer details (higher frequencies) are less likely to represent signal power and should be considered as strongly influenced by the noise power. FSC is, because its general applicability, the quality metric selected for this work. 3. A global optimization algorithm – DCRS There exist many alternatives to tackle the optimization of an objective function [28,29]. Exhaustive search approaches are not usually a practical and affordable alternative. Instead, when the problem to optimize can be mathematically described, the solution can be reached by using deterministic algorithms. However, the mathematical description of the problem under consideration here remains unknown and thus, stochastic global optimization algorithms are the option. Stochastic algorithms progressively reduce the searching space by centering on those sub-domains where the algorithms consider that the optima are more likely to be. The Delayed Controlled Random Search algorithm (DCRS) [30] is the stochastic algorithm selected for this work. DCRS is based on the Controlled Random Search (CRS) algorithm [31,32] and has some interesting properties which make it fit to our problem: • It stores a population of points, where a point represents a set of N parameters to optimize (which implies a N-dimensional search space). This allows the algorithm to store previously evaluated points, and makes it possible to go backwards if the algorithm falls in a local optimum. • It is inherently parallel. Several points can be evaluated at the same time, which means a more detailed and faster exploration of the searching domain. • When generating new searching points, it distinguishes between the so-called primary and secondary points. Primary points tend to expand the searching space so that the algorithm can avoid falling in a local optimum. On the other hand, secondary points concentrate the search on smaller parts of the domain, thereby trying to find the global optimum. At the beginning, the algorithm mainly uses primary points in order to spread the search across the space. As the algorithm evolves, the ratio primary/secondary points progressively decreases in order to concentrate the search for the global optimum. Fig. 3 illustrates how the algorithm calculates new points.
254
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
Fig. 3. Representation of the calculation of a primary point (left) and a secondary point (right). First, four points are randomly selected between the population of stored points. The centroid (represented as a triangle) of three of these points (represented as circles) is computed. The calculation of the new point (represented as a square) will depend on the type of point. If a primary point is being computed it will be the symmetrical of the fourth point (represented as a cross) respect to the centroid. If the point to compute is a secondary point, it will be calculated as the average between the fourth point and the centroid.
Briefly and in general terms, the DCRS algorithm acts as follows: (1) Initialize: Generate a number of randomly distributed points (R1, . . . , RN) covering the search domain. (2) While a stop criterion is not fulfilled (a) Let RW be the point presenting the worst objective function (f(RW)). (b) Generate a new point (primary or secondary), P. (c) If f(P) is worse than the f(RW), then replace RW with P. (d) Else, goto 2b. (3) Return result = (R1, . . . , RN). DCRS performs the simultaneous evaluation of several points at Step (2), which makes it attractive for a parallel implementation. With regard to the stop criteria, there are multiple, used-defined options that can act as stop conditions, such as the maximum number of points to test, precision wanted for the resolution of the points in the population, maximum euclidean distance between points in search domain, etc. A detailed description of CRS and DCRS can be found elsewhere [30–32]. In this work, we intend to find out the optimal values for the relaxation parameter k of the iterative 3D reconstruction algorithms and the parameters a and a of the blob basis function that yield the best 3D structure. So, each point R in the algorithm represents a set of values for these parameters. As stated above, FSC is the metric used as a standard quality measurement in the field of 3DEM. Here, the resolution given by the FSC is the objective function to optimize. 4. A large scale grid computing platform – EGEE To evaluate the quality provided by a set of parameters, the reconstruction using those parameters must be performed. As previously commented, iterative reconstruction algorithms present high requirements in terms of computing time. Also, the process of optimizing the parameters requires the execution of a high number of reconstructions and FSC evaluations. It is very difficult to gain access to a dedicated platform (e.g. a cluster or supercomputer) to perform all the involved computations. Instead, the grid [9] offers a 24-7 access to a computing platform where thousands of CPUs are virtually available. The European Grid for E-SciencE program (EGEE, and EGEEII since May 2006) [33–35], offers a large scale grid infrastructure which combines about 30 K CPUs, 5 Pentabytes of storage space and high-speed, high-capacity interconnection networks. The EGEE Consortium consists of 90 partners from 32 European, Asian and American countries. Users of the EGEE grid are grouped under the so-called virtual organizations.
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
255
Briefly, among other services the EGEE grid architecture provides: • User interface (UI): machines configured as users’ gateway to the grid. This is the only part of the grid which the user physically interacts with. • Resource brokers (RB): they provide information about potential execution hosts based on our job requirements. They also monitor job status on the grid. • Computing elements (CE): nodes configured to act as job execution hosts. • Storage elements (SE): machines specially configured to store data. 5. Parameter optimization for 3D reconstruction on a computational grid In this work, an application has been developed that combines the DCRS algorithm and the EGEE grid computing infrastructure for parameter optimization for 3D reconstruction algorithms. The application has been written using C++, the Qt library for the user interface and EGEE system calls for the interaction with the grid. The application could be easily extended to other grid platforms other than EGEE. 5.1. Global scheme Fig. 4 shows a simplified scheme of the developed application. The most important logical blocks are represented. The user must provide, through the user interface, some data needed to perform the optimization process. This data can be classified as follows: • Configuration data: all the parameters which control the execution of the optimization process, e.g. the stop conditions for the DCRS algorithm. • Executables: the independent binary programs that perform the 3D reconstructions and FSC evaluations that will be sent through the grid to wherever they are required during the optimization process. • Parameters to optimize: Here, the user provides a description of the parameters to be optimized and the range of values where the optimizer must center its work (e.g. [0.00001, 0.05]). • Projections: The whole set of electron microscope projections to be used on the optimization process.
Fig. 4. This figure represents a general scheme of the developed application. The user interacts with the application through a user interface, providing the input data and configuration and getting back the optimized parameters and the progress of the process. All grid operations are mediated through the grid interface, which hides the complexity of such interactions. An optimizer block provides new sets of parameters to test.
256
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
The Optimizer block consists of the DCRS algorithm (or potentially any other optimization algorithm). It stores the optimization progress and generates new sets of parameters to test. These new sets are passed on to the kernel which communicates them to the grid interface. Another important task of the optimizer is to decide if the optimization must finish or continue. If a stop condition is fulfilled, the kernel will ask the grid interface to cancel all running jobs and grid transfers, presenting, through the user interface, the results and statistics of the optimization process. The grid interface takes new sets of parameters to test. It builds up all grid job definitions (written using a Job Description Language (JDL)). Then the jobs are added to a prioritized queue. This queue contains not only 3D reconstruction jobs but also FSC computation jobs. The grid interface also manages all grid file transfers like UI to SE, SE to SE, and SE to UI transfers. 5.2. Files replication File replication allows a single file to be replicated to multiple storage elements of the grid [36]. This has two important impacts: • Implicit fault tolerance: If a storage element fails, there will always be an alternative one to get the needed file from. • Better overall performance [37]: Multiple grid computing elements trying to recover data from the same storage element means very low transfer rates and a poor global performance. Replication allows different computing elements to obtain the same data from different storage elements, avoiding possible bottlenecks and thus obtaining a much better transfer performance. The application uses two different replication mechanisms. First, on starting the optimization process and before sending any job to the grid, projections and executables are replicated through a user-defined number of SEs. This replication process follows a tree-like schema. First, data are sent to a single SE from our UI. Data are then replicated from that SE to other two SEs and so on. The other replication mechanism is that used to replicate the results from remote executions (reconstructed volumes and FSC results). Once a remote execution is finished, the result is replicated to three different SEs. A reconstructed volume is not sent back to the UI. Instead, it is stored in the grid until a FSC job requires it. The application will recover the FSC result from the grid as soon as a FSC job finishes. 5.3. Job management Once the DCRS algorithm has selected a new set of parameters, it is passed on to the grid manager. As described in Section 2, resolution assessment requires the set of projections be split into two groups, perform two independent 3D reconstructions and, finally evaluate the FSC between them. Therefore, for each set of parameters we need to perform two reconstructions, each using one half of the projections. This means that a FSC job will only be submitted to the grid when both reconstructions are finished. The application manages a queue of groups of jobs. Each group can contain from one to many jobs. For each set of parameters a group is created containing two reconstruction jobs. Once the whole group is finished, the FSC job (in this case a group containing a single job) is sent to the grid. The application starts groups based on a simple priority schema: a group containing FSC jobs will always be started before a group containing reconstructions. This is to avoid the execution of a long-lasting potentially useless 3D reconstruction when, depending on the value returned by the FSC, the optimization might be finished. Each job on the grid is tightly monitored by the application and the status information is presented to the user. The status of the jobs is obtained from the Resource Broker with a certain user-selectable periodicity. If a job fails, or it is queued for a long time on the CE, it is resubmitted to the grid. This adds an extra level of fault tolerance [38], which is very important in large scale computational facilities like the EGEE grid. Different scheduling policies can be easily implemented to extend the application.
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
257
6. Experimental evaluation Two different aspects of the optimization process are analyzed in this paper. First, computational performance yielded by the grid and second, quality measurements of the reconstructions selected as optimal. 6.1. Performance evaluation In the field of grid computing, performance evaluation is still a matter of study [15,39–42]. There is a lack of comprehensive performance metrics due to the high complexity and dynamic nature of the grids. There is a speedup-like metric that is used so far to evaluate the performance of an application on a grid [39,40,42,15]. Essentially, it consists in computing the ratio between the potential turnaround time of the application on a node in the grid (i.e., the ‘‘sequential time’’) vs the turnaround time when it was executed on the grid. In this work, we have defined the ‘‘grid speed gain’’ (GSG), a speedup-like metric that accounts for the two types of jobs that our application performs on the grid: reconstructions and FSC evaluations: N ReconstRecons þ N FSCtFSC GSG ¼ ð4Þ tgrid where tgrid denotes the execution time on the grid for an optimization process; NRecons and NFSC represent the number of jobs of each kind that were performed; and tRecons and tFSC their average execution time during the optimization. The numerator of Eq. (4) represents an estimate of the sequential time that the whole process could have taken in a single, average, node of the grid. The denominator is the experimental turnaround time measured as a result of the execution on the grid. Other metrics allowing quantitative performance assessment of the application are the number of finished jobs per time, average time waiting in queue, resubmission rates and data transfer rates [15]. 6.2. Models Models of two biological specimens were used in the experiments: first, Bacteriorhodopsin, the first biological structure determined at atomic resolution through 3D electron microscopy [43]; second, the connector from the bacteriophage virus T7 [44], a component with a key role in viral morphogenesis. A total of 6144 projections were created from each model by simulating the projection process in the electron microscope (Fig. 5). The projections were 100 · 100 (Bacteriorhodopsin) and 128 · 128 (connector) pixels in size with an estimated SNR of 1/3 to resemble experimental conditions in 3DEM.
Fig. 5. Left, two views of the T7 connector model. Center, the same views of the Bacteriorhodopsin model and right, three projections of the Bacteriorhodopsin model obtained simulating an electron microscope and with a SNR = 1/3.
258
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
6.3. Experiments and results Two experiments were performed for each model. First, a single parameter optimization (SPO) of the relaxation parameter k, where the blob parameters were fixed to the standard values: a = 2.0 and a = 10.4 [6]. Second, a multiple parameter optimization (MPO) of k and the blob parameters a and a. In the SPO experiments, the maximum number of sets of parameters to test in parallel in a single iteration of DCRS was limited to 20. This involves 40 reconstruction jobs and 20 FSC evaluations, and an ideal maximum GSG of 20. For MPO, this limit was set up to 40 (i.e. 80 reconstructions and 40 FSC jobs), with a a potential maximum GSG of 40. In this work, the queue time-out was set up to 90 min. Jobs that were in queue in CEs for longer were resubmitted again to the grid. Table 1 summarizes the results from the performance evaluation. 6.3.1. Single parameter optimization For the Bacteriorhodopsin experiment, a total of 420 jobs were completed on the grid. This represents 280 jobs corresponding to 3D reconstruction and the 140 associated FSC jobs. With a resubmission rate of around 20%, the total amount of jobs sent to the grid was 504 jobs. These jobs implied the transfer of about 88 GB of data files (projections, 3D reconstructions and FSC results). The average execution time for 3D reconstructions was 90 min while the FSC jobs took an average of 15 min. This sums up a total of 455 h as the estimated sequential time. As it took 44 h to complete the optimization, the GSG (Eq. (4)) turned out to be around 10. The execution rate proved to be 9.55 jobs/h. Taking into account the amount of data transfers and the optimization time, the average data transfer rate was 0.5 MB/s. The Connector experiment required a total number of 816 jobs (544 jobs for 3D reconstructions, and 272 FSC evaluations). With a resubmission rate of 30%, the total number of jobs sent to the grid was 1061. In this case, the increased number of jobs and the bigger datasets (bigger projections and bigger 3D reconstructed volumes) made the total amount of data transferred scale up to 284 GB. The average time for 3D reconstruction jobs was of 120 min while the FSC jobs took an average of 18 min, which implies a total of 1170 h of estimated sequential time. As the optimization process took 92 h, the GSG was of around 12.7 and the execution rate was 8.87 jobs/h. The average data transfer rate was 0.86 MB/s. In terms of reconstruction quality, Fig. 6 shows the k values tested during each optimization process and ˚ and 11.464 A ˚ for the resolution values obtained for each one. In both cases the optimal resolution, 6.243 A Bacteriorhodopsin and Connector respectively, was reached for low values of k, 0.001576 and 0.003119,
Table 1 Computational statistics extracted from the performed experiments Metrics
Maximum parallel sets of parameters tested Completed jobs Job resubmission rate (%) Total submitted jobs Data transferred (GB) Average reconstruction execution (min) Average FSC execution (min) Average job waiting (min) Estimated sequential time (h) Total execution time (h) GSG Execution rate (jobs/h) Average data transfer rate (MB/s)
Experiments BSPO
CSPO
BMPO
CMPO
20 420 20 504 88 90 15 21 455 44 10 9.55 0.5
20 816 30 1061 284 120 18 24 1170 92 12.7 8.87 0.86
40 900 20 1080 181 90 15 21 975 51 19 17.65 0.98
40 705 40 987 265 222 24 51 1833 120 15.27 5.88 0.61
BSPO, BMPO, CSPO, CMPO represent the experiments, where B denotes Bacteriorhodopsin and C Connector. The subindex denotes the type of experiment: SPO stands for single parameter optimization, where only k was optimized, and MPO stands for multiple parameter optimization, where k and blob parameters are optimized.
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263 0.17
259
0.2
0.16
0.18
0.15 0.16 0.14 0.14 0.13 0.12 0.12
0.1
0.11
0.1
0.08 0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
˚ (vertical Fig. 6. These graphics show the relationship between the k parameter (horizontal scale) and the estimated resolution in 1/A ˚ /pixel for Bacteriorhodopsin and 2.18 A ˚ /pixel for the scale). Resolution values should be multiplied by the sampling of each model (1 A ˚. connector) to obtain normalized resolution in A
respectively. Starting from k = 0, resolution curves smoothly improve up to a maximum from which they start falling down rapidly. The case of the connector is specially interesting. A resolution gap appears just after reaching the optimal resolution values. In this case, optimizing the k value is of paramount interest as selecting a slightly greater than the optimal k value would yield a much poorer resolution. 6.3.2. Multiple parameter optimization The Bacteriorhodopsin experiment needed a total of 900 jobs (600 reconstructions and 300 FSC evaluations). A resubmission rate of 20% resulted in a total of 1080 jobs sent to the grid. Around 181 GB of data was moved across the grid for this experiment. The average execution time for 3D reconstructions and FSC evaluations were 90 min and 15 min, respectively. The estimated total sequential time then was 975 h. The turnaround time of the optimization process on the grid was 51 h, resulting in a GSG of 19 and an execution rate of 17.65 jobs/h. The average data transfer rate was 0.98 MB/s. The Connector experiment required a total amount of 705 jobs. In this case, the heavy load of the grid during the experiment [45] made the job resubmission rate scale up to 40%, with a total number of 987 jobs submitted. A total amount of 265 GB of data were transferred during the experiment. The average execution time for 3D reconstruction jobs and FSC jobs were 222 min and 24 min, respectively. The estimated sequential time then was 1833 h, whereas the experimental turnaround time for the whole optimization on the grid was 120 h, yielding a GSG of 15.28 and an execution rate of 5.88 jobs/h. The average data transfer rate was 0.61 MB/s. Multiple parameters optimization showed interesting results in terms of the quality of the reconstructions. Fig. 7 shows how the resolution obtained for each set of parameters is strongly influenced by the k parameter. The best resolution figures are always obtained for lower k values (values nearby those obtained for SPO) while the other parameters (a and a) do not seem to have an important effect. Although there are slight variations in quality with a and a, the difference in the reconstruction with regard to the standard values is neg˚ and 11.464 A ˚ for Bacteriorhodopsin and Connector, respectively. ligible. The optimal resolution was 6.241 A 6.4. Discussion Performance evaluation of the application presented here has clearly shown the benefits of the computational capabilities of grid computing. For SPO, excellent GSGs ranking around 10–12 (out of a maximum of 20) were obtained. For MPO, GSGs were around 15–19 (out of a maximum of 40). The increase in the number of parallel sets of parameters to test has improved the GSG (e.g. from 10 to 19 in the case of Bacteriorho-
260
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
1/Angs.
1/Angs. 0.17
0.18
0.16
0.16
0.17
0.15
0.18
0.16
0.14
0.16
0.13
0.14
0.12
0.12
0.11
0.1
0.1
0.08
0.06
0.09
0.06
0.04
0.14 0.12
0.15
0.1
0.14 0.13
0.08
0.12 0.11 0.1 0.09
0
0.04
0.01
0.02
0.03
0.04
Lambda
0.05
0.06
0.07
0.08
0.09
2 1.9 1.8 1.7 1.6 1.5 1.4 Blob radius 1.3 1.2 1.1
12 11 10 0
9 0.01
0.02
8 0.03
0.04
Lambda
0.1 1
Alpha
7 0.05
0.06
6 0.07
0.08
5 0.09
0.1 4
1/Angs. 1/Angs.
0.26 0.24 0.22
0.22
0.24
0.2
0.22
0.18
0.16
0.2
0.16
0.14
0.18
0.14
0.12
0.16
0.1
0.14
0.2
0.22
0.18
0.2 0.18 0.16 0.14
0.24 0.26
0.24
0.12 0.08 0.1
0.12 0.1
0.12 0.08
0.1 0.06
0.08
0.08
0.06 12
0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 Lambda 0.009
2.5 2.4 2.3 2.2 2.1 2 1.9 1.8 1.7 1.6 0.011.5
Blob radius
11 10 0 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 Lambda 0.009
9 8 7
Alpha
6 5 0.01 4
Fig. 7. Upper row shows the relationship between the sets of parameters tested during the optimization process and the resolution ˚ ) for the Bacteriorhodopsin model. Lower row shows the same relationships obtained for the Connector model. obtained for each one (1/A
dopsin). However, the increase rate is not linear because of the job management, grid latencies, and other grid factors involved [15]. The performance results have also shown the varying nature of the grid, particularly with the case of the Connector. It is noticeable the change in the average computing time for reconstruction and FSC in the case of MPO with respect to SPO (222 m vs 120 m, and 24 m vs 18 m, respectively). Similar changes have also been observed on the average data transfer rate, which decreased from 0.86 to 0.61 MB/s, and in the resubmission rate, which increased from 30% to 40%. The experiments with the Connector were carried out in dates with very different computational loads in the grid [45]. These figures show the big impact that the grid load may have on the job execution time and the global grid performance, and clearly exhibit the dynamic character of the grid. The results also show the high-throughput capabilities of grid computing, particularly for the SPO and Bacteriorhodopsin experiments. SPO experiments showed an execution rate of around 9 jobs/h, regardless of the model. The increase in the number of parallel parameter sets in MPO made this rate increase up to 17.65 for Bacteriorhodopsin. However, for the Connector, this rate fell down to 5.88 in the case of MPO due to the problems just mentioned. From the user standpoint and based on our experience during this work on the EGEE grid, grid usability depends on a wide range of factors, from the state of the user interface machine to the configuration of the different grid elements and the grid load at every moment. In particular, in this work we have had to deal with stability problems on performance coming from specific but recurrent situations like (1) grid saturation due to occasional global challenges performed to measure grid behaviour under heavy loads; (2) misconfiguration of
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
261
some computing and/or storage elements; (3) user interface problems, related to hard disk saturation, heavy CPU loads, network failures, etc. These instability problems have also been observed in other works carried out on EGEE [15]. As far as the optimization of the parameters itself is concerned, the results point out that the most crucial parameter is the relaxation parameter k. Compared to the best resolution obtained when using SPO (k), the results obtained by MPO did translate into negligible improvements. Therefore, for experimental structural analyses by electron microscopy, the blob parameters a and a may be set up to their standard values since their optimization will not provide significantly better resolution. This would also make the parameter optimization process simpler and faster. 7. Conclusions This work has presented an approach for parameter optimization of 3D reconstruction algorithms that combines global optimization techniques to search for the optimal set of parameters with the high-throughput capabilities of grid computing. This approach has been successfully tested in the European EGEE grid, and good levels of speedup and throughput have been obtained. Grid computing has proved to be a powerful tool for those applications where huge amounts of CPU-intensive jobs need to be executed. In this work, we have applied it to parameter optimization of 3D reconstruction algorithms. But also many other optimization problems, or other applications where a large number of independent jobs are involved could benefit from the work presented in this paper. Computational performance of the grid is quite difficult to measure and predict and, moreover, it changes constantly over the time. This is specially serious in large scale grids. Therefore, grid applications should be supplied with mechanisms to deal with this type of dynamics. In this work, we have provide our approach with fault tolerance techniques, which are of paramount importance to tackle long-lasting problems when using the grid. Centering on the problem of parameter optimization in 3D reconstruction algorithms, the relaxation parameter k turned out to be the most influential parameter in terms of the resolution of the final reconstruction. In fact, the optimization of multiple additional parameters did not involve significant improvements in the reconstruction quality. The approach presented here allows efficiently finding out the optimal value of the relaxation parameter k for the specimen and conditions under consideration, which ensures the best possible resolution in the 3D reconstruction. Our future work consists of developing a friendly environment that allows the use of this approach for parameter optimization of 3D reconstruction in experimental environments, such as in the National Center for Biotechnology in Madrid (Spain). Also, we intend to apply the combination of grid computing and global optimization to tackle other real interesting computationally expensive problems, such as image processing [46] or facility location [47]. Acknowledgements The authors wish to thank Dr. X. Aguirrezabala for kindly providing the connector dataset. The Bacteriorhodopsin model was taken from the Protein Data Bank (PDB). This work has been partially supported by grants JA-P06-TIC-01426, MEC-TIN2005-00447, MEC-BFU2004-00217/BMC and EU-FP6-LSHG-CT2004-502828, and by the Spanish National Research Council (CSIC) through the fellowship programme ‘‘Unidades Asociadas’’. EGEE is a project funded by the EU under contract IST-2003-508833. References [1] A. Sali, R. Glaeser, T. Earnest, W. Baumeister, From words to literature in structural proteomics, Nature 422 (2003) 216–225. [2] J.J. Ferna´ndez, C.O.S. Sorzano, R. Marabini, J.M. Carazo, Image processing and 3-D reconstruction in electron microscopy, IEEE Signal Process. Mag. 23 (3) (2006) 84–94. [3] J. Frank, Three-Dimensional Electron Microscopy of Macromolecular Assemblies, Oxford University Press, 2006. [4] M. Rademacher, Weighted back-projection methods, in: J. Frank (Ed.), Electron Tomography, Plenum Press, 1992, pp. 91–115.
262
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
[5] G.T. Herman, Algebraic reconstruction techniques in medical imaging, in: C. Leondes (Ed.), Medical Imaging Systems Techniques and Applications, Gordon and Breach Science, 1998, pp. 1–42. [6] J.J. Ferna´ndez, A.F. Lawrence, J. Roca, I. Garcı´a, M.H. Ellisman, J.M. Carazo, High performance electron tomography of complex biological specimens, J. Struct. Biol. 138 (2002) 6–20. [7] J.J. Ferna´ndez, J.M. Carazo, I. Garcı´a, Three-dimensional reconstruction of cellular structures by electron microscope tomography and parallel computing, J. Parallel Distribut. Comput. 64 (2004) 285–300. [8] J.R. Bilbao-Castro, J.M. Carazo, I. Garcı´a, J.J. Ferna´ndez, Parallelization of reconstruction algorithms in three-dimensional electron microscopy, Appl. Math. Modell. 30 (2006) 688–701. [9] I. Foster, C. Kesselman (Eds.), The Grid 2e: Blueprint for a New Computing Infrastructure, 2nd ed., Morgan Kaufman Publisher, 2003. [10] B. Boghosian, P. Coveney (Eds.), Scientific applications of grid computing I, Comput. Sci. Eng. 7 (5) (2005) 10–13. [11] B. Boghosian, P. Coveney (Eds.), Scientific applications of grid computing II, Comput. Sci. Eng. 7 (6) (2005) 10–11. [12] M. Parashar, C.A. Lee, Special issue on grid computing, Proc. IEEE 93 (3) (2005) 479–484. [13] D. Clery, Infrastructure – can grid computing help us work together? Science 313 (2006) 433–434 [14] J.J. Ferna´ndez, J.R. Bilbao-Castro, R. Marabini, J.M. Carazo, I. Garcı´a, On the suitability of biological structure determination by electron microscopy to grid computing, New Generation Comput. 23 (2005) 101–112. [15] J.J. Ferna´ndez, I. Garcı´a, J.M. Carazo, R. Marabini, Electron tomography of complex biological specimens on the Grid, Future Generat. Comput. Syst. 23 (2007) 435–446. [16] R.M. Lewitt, Alternatives to voxels for image representation in iterative reconstruction algorithms, Phys. Med. Biol. 37 (1992) 705– 716. [17] R.M. Lewitt, Multidimensional digital image representation using generalized Kaiser–Bessel window functions, J. Opt. Soc. Am. A7 (1990) 1834–1846. [18] S. Matej, R.M. Lewitt, G.T. Herman, Practical considerations for 3D image reconstruction using spherically symmetric volume elements, IEEE Trans. Med. Image. 15 (1996) 68–78. [19] R. Marabini, G.T. Herman, J.M. Carazo, 3D reconstruction in electron microscopy using ART with smooth spherically symmetric volume elements (blobs), Ultramicroscopy 72 (1998) 53–56. [20] C.O.S. Sorzano, R. Marabini, N. Boisset, E. Rietzel, R. Schroder, G.T. Herman, J.M. Carazo, The effect of overabundant projection directions on 3D reconstruction algorithms, J. Struct. Biol. 133 (2001) 108–118. [21] E. Alba, J.M. Troya, Analyzing synchronous and asynchronous parallel distributed genetic algorithms, Future Generat. Comput. Syst. 17 (2001) 451–465. [22] E. Alba, A.J. Nebro, J.M. Troya, Heterogeneous computing and parallel genetic algorithms, J. Parallel Distribut. Comput. 62 (2002) 1362–1385. [23] E. Alba, F. Luna, A.J. Nebro, J.M. Troya, Parallel heterogeneous genetic algorithms for continuous optimization, Parallel Comput. 30 (2004) 699–719. [24] F. Luna, A.J. Nebro, E. Alba, Observations in using grid-enabled technologies for solving multi-objective optimization problems, Parallel Comput. 32 (2006) 377–393. [25] D. Lim, Y.S. Ong, Y. Jin, B. Sendhoff, B.S. Lee, Efficient hierarchical parallel genetic algorithms using grid computing, Future Generat. Comput. Syst. 23 (2007) 658–670, doi:10.1016/j.future.2006.10.008. [26] M.V. Heel, Similarity measures between images, Ultramicroscopy 21 (1987) 95–100. [27] N. Grigorieff, Resolution measurement in structures derived from single particles, Acta Crystallogr. D 56 (2000) 1270–1277. [28] F. Archetti, F. Schoen, A survey on the global optimization problem: general theory and computational approaches, Ann. Operat. Res. 1 (1984) 87–110. [29] A. Torn, A. Zilinskas, Global optimization, Lecture Notes in Computer Science, vol. 350, Springer-Verlag, 1989. [30] P.M. Ortigosa, J. Balogh, I. Garcı´a, A parallelized sequential random search global optimization algorithm, Acta Cybernet. 14 (2) (1999) 403–418. [31] W.L. Price, Global optimization by controlled random search, J. Optim. Theory Appl. 40 (1983) 333–348. [32] E. Hendrix, P. Ortigosa, I. Garcı´a, On success rates for controlled random search, J. Global Optim. 21 (2001) 239–263. [33] F. Gagliardi, B. Jones, F. Grey, M.E. Be´gin, M. Heikkurinen, Building an infrastructure for scientific grid computing: status and goals of the EGEE project, Philos. Trans. R. Soc. A – Math. Phys. Eng. Sci. 363 (2005) 1729–1742. [34] B. Jones, An overview of the EGEE project, Lecture Notes in Computer Science 3664 (2005) 1–8. [35] F. Gagliardi, The EGEE European Grid infrastructure project, Lecture Notes in Computer Science 3402 (2005) 194–203. [36] P. Kunszt, E. Laure, H. Stockinger, K. Stockinger, File-based replica management, Future Generat. Comput. Syst. 21 (2005) 115–123. [37] M. Tang, B.-S. Lee, X. Tang, C.-K. Yeo, The impact of data replication on job scheduling performance in the data grid, Future Generat. Comput. Syst. 22 (2006) 254–268. [38] E. Huedo, R.S. Montero, I. Llorente, Evaluating the reliability of computational grids from the end user’s point of view, J. Syst. Architect. 52 (2006) 727–736. [39] A.G. Hoekstra, P.M.A. Sloot, Introducing grid speedup gamma: a scalability metric for parallel applications on the grid, Lecture Notes in Computer Science 3470 (2005) 245–254. [40] A. Krishnan, Gridblast: a globus-based high-throughput implementation of blast in a grid computing framework, Concurr. Comput. Pract. Exp. 17 (2005) 1607–1623. [41] H.L. Truong, T. Fahringer, S. Dustdar, Dynamic instrumentation, performance monitoring and analysis of grid scientific workflows, J. Grid Comput. 3 (2005) 1–18.
J.R. Bilbao-Castro et al. / Parallel Computing 33 (2007) 250–263
263
[42] R.S. Montero, E. Huedo, I. Llorente, Benchmarking of high throughput computing applications on grids, Parallel Comput. 32 (2006) 267–279. [43] R. Henderson, J.M. Baldwin, T.A. Ceska, F. Zemlin, E. Beckmann, K.H. Downing, Model for the structure of bacteriorhodopsin based on high-resolution electron cryo-microscopy, J. Mol. Biol. 213 (1990) 899–929. [44] X. Aguirrezabala, J. Martı´n-Benito, M. Valle, J.M. Gonza´lez, A. Valencia, J.M. Valpuesta, J.L. Carrascosa, Structure of the ˚ resolution: Structural homologies of a basic component of a DNA translocating machinery, J. connector of bacteriophage T7 at 8 A Mol. Biol. 347 (2005) 895–902. [45] H. Hammerle, EGEE Grid attacks Avian flu, EGEE Newsletter, May issue (2006) 2. [46] P.M. Ortigosa, J.L. Redondo, I. Garcı´a, J.J. Ferna´ndez, A population global optimization algorithm to solve the image alignment problem in electron crystallography, J. Global Optim. 37 (2007) 527–539, doi:10.1007/s10898-006-9060-x. [47] B. Pelegrı´n, J.L. Redondo, P. Ferna´ndez, I. Garcı´a, P.M. Ortigosa, GASUB: finding global optima to discrete location problems by a genetic-like algorithm, J. Global Optim., in press, doi:10.1007/s10898-006-9076-2.