Computers & Geosciences 49 (2012) 21–28
Contents lists available at SciVerse ScienceDirect
Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo
Implementation and performance optimization of a parallel contour line generation algorithm Jibo Xie n Key Laboratory of Digital Earth, Center for Earth Observation and Digital Earth, CAS No. 9 Dengzhuang South Road, Haidian District, Beijing 100094, China
a r t i c l e i n f o
abstract
Article history: Received 13 February 2012 Received in revised form 6 June 2012 Accepted 12 June 2012 Available online 23 June 2012
This paper introduces a parallel contour line interpolation algorithm from Digital Elevation Model (DEM). Contour generation from DEM is the basic task of the computer-aided mapping and one of the most important applications of DEM. With the increasing of DEM resolution, the Digital Terrain Analysis (DTA) has become one of the computing-intensive tasks in GIS. Many studies have been done on the implementation of parallel DTA algorithms based on different parallel hardware and software. In this paper, we use the open source GIS toolkit to implement a parallel contour generation algorithm. The Message Passing Interface (MPI) standard is used for the parallel algorithm programming. A slackly LAN (Local Area Network)-connected windows cluster is setup to implement and test the parallel contour line generation algorithm. The performance optimization method for the parallel algorithm is specifically discussed, including data redundancy method, group communication, packaging collection of results, and memory optimization method for results merging. The experimental results show the capacity and potential to implement the parallel GIS algorithms based on open source GIS toolkit in the LAN-connected PC environment. & 2012 Elsevier Ltd. All rights reserved.
Keywords: DEM Contour line Parallel computing Performance optimization
1. Introduction The Digital Elevation Model (DEM) is the 3D representation of the terrain surface. Photogrammetry based on airborne and spaceborne sensors have made global DEMs available. Especially the open data access of SRTM (Shuttle Radar Topography Mission) DEM and Aster GDEM promotes the further use of Digital Terrain Analysis (DTA) applications in hydrology, geomorphology and others. The most widely used data structure of DEM is the regular Grid format. With the increasing of DEM grid resolution, DTA has been one of computing-intensive tasks in Geographic Information System (GIS). Parallel computing has been used to enhance the performance of the GIS applications. Many studies have done on how to parallelize the GIS application in the past decades. Parallel processing for GIS applications was studies on heterogeneous networks in 1996 (Clematis et al.,1996). Parallel GIS algorithms were introduced in 1998 (Healey et al., 1998). A layered approach was used to parallelize the geographical applications in 1999 (Mineter and Dowers, 1999). And parallel algorithms were introduced for processing large-scale geographic data in 2003 (Hawick et al., 2003). In recent years, some researchers tried to parallelize
n
Tel.: þ86 10 82178066; fax: þ86 10 82178062. E-mail addresses:
[email protected],
[email protected]
0098-3004/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.cageo.2012.06.011
the GRASS GIS to enhance the computing performance (Sorokine, 2007; Huang et al., 2011). And also many studies have been done on the parallelization of DTA algorithms in the deferent development phase of high performance computing. Rokos and Armstrong (1992) studied the parallel feature extraction algorithms implemented on a Transputer parallel computing machine. Data parallel method was used for drainage basin analysis (Mower, 1994). The inter-visibility analysis model was parallelized to enhance the performance on a parallel computing system (Mills, et al., 1992). Parallel hill shading algorithm was studied in 1994 (Ding and Densham, 1994). Parallel DTA algorithms were described in a book (Healey et al., 1998). High throughput computing was used to solve the computing intensive problem of DTA (Mineter et al., 2003; Gong and Xie, 2009). Optimizing grid computing configuration and scheduling was tested to enhance the DTA performance in a Grid computing environment (Huang and Yang, 2011). In recent years, Graphic Processing Unit (GPU) was used to accelerate DTA algorithms such as the view-shed analysis (Fang et al., 2011). It is a difficult work to implement parallel GIS algorithm for the common GIS researchers due to complexity of parallel implementation and access to HPC hardware. In the implementation process of a parallel algorithm, the data decomposition, communication, data transfer, synchronization, etc should be considered. Parallel domain decomposition is usually cast as a graph partitioning problem, requiring that each processor have
22
J. Xie / Computers & Geosciences 49 (2012) 21–28
equal amount of data, and that inter-processor communication be minimized (Antonio and Paz, 1997). Synchronization is an important issue for the design of a scalable parallel computer (Stricker et al., 1995). Communication in a parallel system frequently involves moving data from the memory of one node to the memory of another (Stricker and Gross, 1995). Minimizing communication and synchronization costs is crucial to the realization of the performance potential of parallel computers (Kandemir et al., 2000). In this paper, the open source GIS software is used to implement a parallel contour line generation algorithm in a LAN (Local Area Network)-connected PC cluster. The contour lines are the main graphical elements to characterize the 3-dimensional terrain on the 2-dimensional map sheets (Schmieder and Huber, 2000). Contour lines generation is the basic task of the computeraided mapping and one of most important applications of DEM. Before the implementation of a parallel algorithm, the parallelism of the serial algorithm is needed to be analyzed to determine if the algorithm can be parallelized. The parallel prototype and data decomposition method should be studied. And performance optimization methods are the most important issues to enhance the parallel efficiency, which is one of the main focuses of the paper. This paper is organized as follows: Section 2 describes the serial contour line generation algorithm implemented in an open source GIS toolkit. The data decomposition method and computing processes of the parallel contour line generation algorithm are proposed and analyzed in section 3. In section 4, the optimization methods to enhance the parallel performance are studied. The performance test and analysis of the parallel algorithm is described in section 5. The last section gives the conclusion and the further study direction.
2. Serial contour line generation algorithm The contour lines visualize simultaneously flat and steep areas as well as ridges and valleys and the terrain shape can be detected much easier than by a digital elevation model (Riegler et al., 2006). There are different methods to generate the contours for different digital elevation data (Hiremath and Kodge, 2010). In this paper, the contour lines interpolation method from DEM is used. The serial algorithm in the paper is implemented in the GDAL (Geospatial Data Abstraction Library, http://www.gdal.org/), an open source GIS Library. The basic principle of the contour interpolation algorithm is as follows: The value of each grid represents the elevation of the grid center. In order to generate contour line, connect the centers of four neighbor pixels (up, down, left and right) by dash lines as shown in Fig. 1(a). The points of one contour line with specific elevation can be interpolated on edge of the dash lines and one segment of the contour line will be generated by connecting the points. As Fig. 1(a) shows, the elevation values of four grids are 5, 3, 1, 12 and the 10 m contour can be interpolated and connected by the solid line. And there are several special cases to be considered in the process of the algorithm implementation.
2.1. Ambiguity interpolation Ambiguity refers to two values among four neighbor grids with equal elevation values, which will induce uncertainty of the direction of the contour line as shown in Fig. 1(b). The solution is to add a point at the center of four neighbor points and the elevation value of the center point will be interpolated by the four neighbor grid points. Then connect the new point with the former four neighbor points and generate sub-grid for interpolation.
Fig. 1. Illustration of the contour generation algorithm (Implemented in GDAL). (a) Contour generation algorithm between four neighbor DEM grids. (b) Ambiguity of the algorithm and (c) No-data condition.
2.2. No-data condition In the condition that a grid is with no-data value, the method illustrated in Fig. 1(c) can be used to solve it: Select the middle points of the two neighbor points of the no-data grid and connect the two points with the center point to generate two auxiliary lines. The value of the center point is calculated by the average value of the three points. Then the contour line can be interpolated as shown in Fig. 1(c). If elevation values of some grids are equal to the value of the contour line, it will make the process of determination complex and cause contour discontinuities. The solution is to add a tiny offset to the elevation values of the grids. The computing sequence of the contour interpolation algorithm is from top line to the bottom line of the DEM data file. Each line is processed in sequence. Started from the first line, each two neighbor lines are used to interpolate contour segments from top to bottom. A proper threshold should be selected to match and connect the interpolated segments into contour lines. The computing process is illustrated in Fig. 2. After reading DEM data and input parameters, the DEM data is processing from the first line. For the first line, copy it as the before line (in order to connect the contour line to the top boundary). If not the first line, the current line is processed with the before line. Connect the center points of the four grids in the two neighbor lines to interpolate contour segments as shown in Fig. 1. After processing each line, merge the generated segments with the existing contour lines. In order to make the contour line connect to the bottom boundary, the last line of the file is copied and processed. Then the vector file of the contour lines is written to the output file.
J. Xie / Computers & Geosciences 49 (2012) 21–28
3. Parallelization of the contour line interpolation algorithm The parallel algorithm design should be based on a specific parallel architecture, including parallel software and hardware. The shared memory and distributed memory are the two kinds of commonly used parallel programming prototypes. The cluster with distributed memory is the prevailing architecture of the
Fig. 2. Process of the serial contour interpolation algorithm.
23
modern high performance computing. In this paper, a slackly connected windows PC cluster is used as the parallel hardware. The Message Passing Interface (MPI), a parallel computing standard suitable to distributed memory, is used to implement the parallel algorithm. By this type of parallel programming, machines in the cluster can be divided into master node and worker (or slave) nodes. The master node reads the input data and parameters, partitions the data and sends the tasks to the worker nodes. Worker nodes receive tasks from master node and executing the tasks in parallel. Sometimes worker nodes need to communicate each other to exchange intermediate data. After the worker nodes finished the computation, the results are sent back to the master node. The parallelization of algorithm using square-grid DEM can use the data parallel computing strategy with specific data decomposition method. According to the computing complexity, the DTA algorithms can be divided into three types, point computation, regional computation, and global computation. For the point commutation, the calculation of one grid has no relevance with other grids, which means the algorithm has good parallelism. Regional computation refers that the algorithms has calculation limited within a region. And the global computation means that all the grids will involved in the computation. For parallel computing with distributed memory prototype, the point and regional computation algorithms has better parallelism than the global computation. The serial contour line generation algorithm processes data in line. The contour segments are interpolated and merged between two neighbor lines. So the computation is executed between each two neighbor lines, which means that the input DEM data can be horizontal decomposed into blocks for parallel processing. And each block can be assigned to worker nodes for parallel processing and it’s an effective data decomposition method for loadbalance. If the number of computing nodes is not large, some block can be assigned to the master node. A redundant line can be used in the neighbor blocks to reduce the communication between computing nodes as shown in Fig. 3. The parallel algorithm is implemented based on MPI standard in Cþþ language. GDAL API is used for DEM data reading and vector contour file output. The serial contour interpolation algorithm was implemented in the GDAL toolkit. The parallel computing hardware is a loosely connected cluster and MPICH2 (http:// www.mcs.anl.gov/research/projects/mpich2/) is setup as the MPI running environment. The computing process of the parallel contour generation algorithm is shown in Fig. 4. The parallel computing process of contour lines interpolation algorithm is as following: Firstly, the parallel program is initialized and the master node gets the number of the computing nodes. The master node reads the DEM data and input parameters. Then the master node decomposes the raw DEM data into horizontal blocks according
Fig. 3. Two types of decomposition methods. (a) Data exchange by MPI and (b) Data redundancy.
24
J. Xie / Computers & Geosciences 49 (2012) 21–28
to the number of the computing nodes (the master node can also get one block if the number of the computing node is limited) and broadcast the related parameters to each work node. Two redundant lines of neighbor blocks should be copied to reduce
communication between computing nodes when the DEM data is decomposed. By this way, each block of data can be independently computed. Each work node receives the DEM data blocks and gets the broadcasted parameters. Then work nodes begin to interpolate and merge contour segments between each two lines in each block. After each computing node completes the computing task, the output contour lines will be sent to the master nodes. In order to enhance the communication efficiency, a certain amount of contours are packed into packages to send out. After the master node finishes receiving computing results from each computing node, the received contour packages are unpacked and merged into contour lines. After finishing all the above steps, the vector contour lines file is written out in ESRI shape file.
4. Performance optimization for the parallel algorithm Computing efficiency is the most important issue to be considered for the parallel algorithm implementation. The communication time cost is especially important for the cluster with distributed memory. For the parallel algorithm in the paper, the data scattering and results collecting need communication between the master node and worker nodes. For the message passing programming model, the optimization method of the communication is critical for the parallel efficiency enhancement. There are several factors to determine the time consumption of message passing in the parallel computing system (Grama et al., 2003): start time (ts), each station time (th), and transit time of each byte (tw). Considering the factors that influence the message passing, some optimized methods can be adopted to reduce the time consumption of message passing. (1) Big block communication: Pack tiny messages into big one which will reduce the ts of a bunch of small messages. In the cluster environment, ts is commonly bigger than th and tw for small message. (2) Lower frequency of communication: Reduce the time of message passing by means of data redundancy. (3) Shorter distance of data transfer: Reduce the number of message passing stations.
Fig. 4. Parallel contour line generation algorithm.
For the contour line generation algorithm, communication consists of two parts: data distribution and results collection. Optimization method for DEM data and parameters distribution should be considered. In order to reduce the time consumption of master/worker communication, parameters and data distribution based on group communication method will avoid point to point communication, which will reduce the frequency of message passing. In the process of contour interpolation, each computing node needs to know the related parameters. The MPI broadcast
Fig. 5. Data structure of contour line.
J. Xie / Computers & Geosciences 49 (2012) 21–28
25
Fig. 6. Memory optimization for contour lines merging. (a) Contour lines in the memory of the master node and (b) Remaining contours.
method is used to send the parameters to the computing nodes. For the DEM data distribution, the MPI scattering communication method is used. Results collection is another part to be optimized. After processing the whole DEM data block, each work node will send back the generated contour lines to the master node. And master node will merge the results received from all the worker nodes. Because the contour lines will be sent by message based on MPI standard, the data structure is needed to be defined in the parallel implementation. The data structure of the contour line is shown in the Fig. 5. ‘‘ContourItem’’ is the data structure of one contour line and ‘‘ContourLevel’’ is a collection of contours with the same elevation value. The definition of data structure is convenient for contour lines boundary merging which is executed between the contour lines with the same elevation value. According to the contour interval and elevation difference, each computing block may generate thousands of contour lines. If send one contour each time, the communication will be quite frequent and each communication will cost the start time (ts), which will greatly reduce the efficiency. The solution is to use data packing method in the process of communication. The MPI ‘‘Pack’’ and ‘‘Unpack’’ function is used for send data by means of packing the data into a continuous buffer and unpack the buffer after receiving the package. For the message packing method, the bigger the size of the data package, the more efficient the communication will be. But it should be considered that the data size of each package may be oversize and cause failure if it is too big. In the experiment of the paper, if the amount of the contours is bigger than 3000, the message passing error will be caused. In the programming implementation of the parallel algorithm, each package of 1000 contour lines is suitable for functional message passing. The contour line is a continuous curve, so contour lines computed from each block should be merged together to generate the final result. The master node will receive contours from each computing nodes, match and merge them by sequence. If the amount of the contour line in the memory of the master node is too large, it will become the bottle neck and will influence the computing scalability. Considering the continuous character of the contour line, the already closed lines (or the start-point and the end-point are in the three boundaries excluding the merging boundary) can be removed from the memory and outputted into file as shown in Fig. 6. By this means, the amount of contours in the memory can be reduced and the merging efficiency can be increased.
5. Performance test and analysis Performance of a parallel algorithm can be evaluated by three factors (Xavier and Iyengar, 1998): computing time (time complexity), number of processors (Processor complexity) and machine
Table 1 Performance test of the parallel algorithm. Computing nodes Data size 2000 2000
TN SN EN TN SN EN
4000 4000
1
2
4
6
30.34 1 1 235.36 1 1
13.56 2.24 1.12 130.43 1.8 0.9
8.01 3.79 0.95 86.64 2.72 0.68
7.03 4.32 0.72 84.42 2.79 0.465
(TN: Execute time (s), SN: speedup, EN: efficiency).
Fig. 7. Speedup of the parallel algorithm.
model. Speedup and efficiency are commonly used indicators to show the performance of the parallel algorithm. Given one optimal serial algorithm whose time complexity is Ts and its corresponding time complexity is Tp, processor complexity is P. It can be defined as Speedup ¼ Ts=Tp
ð1Þ
Efficiency ¼ Ts=PTp
ð2Þ
Normally, speedup will not exceed the number of CPUs (or CPU Cores). It is supposed to be highly efficient if the speedup of parallel algorithm is equal to the number of CPUs. Efficiency shows the effective utility rates of all the CPUs. P processors can only finish P times of workload in each time unit, so theoretically the limit of efficiency is 1. The efficiency of very few algorithms can be bigger than 1 under some circumstance. Another indicator to evaluate a parallel algorithm is scalability, which means that the performance of the parallel system will keep growing with the increase of tasks amount and number of processors. In the cluster, adopting appropriate strategies can enhance the parallel
26
J. Xie / Computers & Geosciences 49 (2012) 21–28
Fig. 8. Efficiency of the parallel algorithm.
scalability. One way is to make the computation and communication to overlap, which means communication is done during the period of computation. In the five steps of the parallel algorithm of this paper, the time cost consists of two parts: time cost of communication and computing. Communication time includes DEM data and parameters distribution to all computing nodes together with the time consumption of contour lines collecting from computing nodes. Other time consumption is the merging process of the received contour lines. To test the performance of the parallel contour line generation algorithm, a loosely connected cluster is setup with 6 PCs in a LAN network. The PC configuration is Pentium(R) CPU 1.8GHz, 256 M memory, 80 GB hard disk with windows XP system.
Fig. 9. Parallel algorithm results compared with GIS software. (a) Input DEM data and output contours (Region 1 and Region 2 for validation of the parallel algorithm), (b) Results compare of parallel algorithm and GIS software for region1 and (c) Results compare of parallel algorithm and GIS software for region 2.
J. Xie / Computers & Geosciences 49 (2012) 21–28
MPICH2 is setup on each PC. DEM data (Grid size: 25 25) with data size of 2000 2000 and 4000 4000 in the same area is used for test. The maximum and minimum elevation values are 4277.4 m and 1237.3 m. The data sizes of output contour files are 9.73 MB and 33.7 MB respectively in the vector file format. The performance test results are listed in Table 1. The speedup and efficiency of the algorithm are shown in Fig. 7 and Fig. 8. By analysis of the performance test, the time consumption of the parallel algorithm is effectively reduced (but not in linear increasing) with the number of computing nodes increases. As Fig. 7 shows, the parallel algorithm can get relatively considerable speedup in the loosely connected cluster. For the 2000 2000 DEM data, the speedup can reach 4.32 when 6 computing nodes are used. For the 4000 4000 DEM data, the maximum speedup can reach 2.79. The reason for the speedup decrease is that the communication gets more frequent and the results merging costs more time when the data size increases. Another reason is that the memory of master node is limited. The effective solution is to add the memory size of the master node. As shown in Fig. 8, the computing efficiency varies with the number of computing nodes involved in the computing. When two computing nodes (including master node) are used, the efficiency is 1.12(a little bigger than 1) and 0.9 for the two types of test data. When 4 computing nodes used, the efficiency is 0.95 and 0.68 separately. When 6 nodes are used, the efficiency decreases to 0.72 and 0.465. The reason for the efficiency decrease is that the time consumption of communication and results merging increases with more computing nodes involved into computing. There is an overspeedup (bigger than 1) and part of the reason is the optimization of the contour merging method introduced in Fig. 6. Because the cluster for the experiment is a loosely connected, the performance is still limited. The further test on the professional high performance hardware will be done in the near future. Fig. 9(a) shows the input DEM and output contour lines. Two sub-regions are selected to compare with the contour line results generated from current GIS software. Fig. 9(b) and (c) are comparison of the results. We can see that the output contour lines of the parallel algorithm have good results compared to that of the commercial GIS software.
6. Conclusion Parallel computing provides efficient solution to enhance the processing speed of DTA algorithms. This paper studies the parallel implementation and performance optimization methods of the contour line generation algorithm. Based on a serial algorithm implemented in GDAL, the parallel algorithm is implemented based on MPI. The performance is tested in a loosely LAN connected cluster environment. The optimization strategies are studied to improve the efficiency of the parallel algorithm. Horizontal block decomposition method is used to partition the input DEM into computing subtasks. Group communication is used to reduce the time consumption of the message passing. Data redundancy is the effective method to reduce the frequency of communication. The data package method is used to reduce the frequency of results collecting. And memory optimization method for contour lines merging is proposed to enhance the performance of the master node. The performance test shows that the parallel algorithm is effective to enhance the computing efficiency of contour line generation. The output results of contour lines are compared to the commercial GIS software, which verifies that the validity of the parallel algorithm. There is still a lot of further work to do, e.g. how to parallelize the GDAL module, how multi-core would work vs. multiple computers, and how to use GPU to speed up the DTA algorithms. GDAL is a wildly
27
used translator library for raster geospatial data formats. To study how to parallelize the GDAL modules is quite valuable. As introduced in Section 3, the raster algorithms can also be divided into three types, point computation, regional computation, and global computation according to the computing complexity. The parallel programming templates (including data decomposition, data scattering, results collection etc.) can be developed based on the analysis of the different parallelizability of the GDAL modules. Parallel API can be implemented to ease the implementation of the parallel algorithm. And the parallel I/O technique can be used to enhance the GDAL read/write performance. The processors of computers have evolved from single core to multi-cores (dual-core, quad-core, hexa-core, and octa-core). It is important to explore the potential of multi-core processor. The shared memory parallel programming model such as OpenMP can be used for the multi-core parallel algorithms. The multiple computers with multi-cores can work together with the two-level parallel programming protocol using MPIþOpenMP in the cluster. And GPU has been harnessed to enhance performance of the super computing. To speed up the DTA algorithm by GPU, the programming model such as CUDA (Compute Unified Device Architecture) can be used. And it is the challenge to implement the parallel DTA algorithms based on a hybrid cluster with multi-core processors and GPUs.
References Antonio, J., Paz, I., 1997. Evaluation of Parallel Decomposition Algorithms. In: 1st National Computer Science Encounter. Workshop of Distributed and Parallel Systems. 44–50 pp. Clematis, A., Falcidieno, B., Spagnuolo, M., 1996. Parallel processing on heterogeneous networks for GIS applications. International Journal of Geographical Information Systems 10 (6), 747–767. Ding, Y., Densham, P.J., 1994. A loosely synchronous, parallel algorithm for hill shading digital elevation models. Cartography and Geographic Information Systems 21 (1), 5–14 1994b. Fang, C., Yang, C., Chen, Z., Yao, X., Guo, H., 2011. Parallel algorithm for viewshed analysis on a modern GPU. International Journal of Digital Earth 4 (6), 471–486. GDAL—Geospatial Data Abstraction Library. /http://www.gdal.org/S. Grama, A., Kumar, V., Gupta, A., Karypisv, G., 2003. An Introduction to Parallel Computing: Design and Analysis of Algorithms. Addison Wesley. Gong, J., Xie, J., 2009. Extraction of drainage networks from large terrain datasets using high throughput computing. Computers & Geosciences 35 (2), 337–346. Healey, R., Dowers, S., Gittings, B., Mineter, M., 1998. Parallel Processing Algorithms for GIS. Taylor & Francis, London, UK, pp. 460. Hawick, K.A., Coddington, P.D., James, H.A., 2003. Distributed frameworks and parallel algorithms for processing large-scale geographic data. Parallel Computing 29 (10), 1297–1333. Hiremath, P.S., Kodge, B.G., 2010. Generating contour lines using different elevation data file formats. International Journal of Computer Science and Applications (IJCSA) 3 (1), 19–25. Huang, Q., Yang, C., 2011. Optimizing grid computing configuration and scheduling for geospatial analysis: an example with interpolating DEM. Computers & Geosciences 37 (2), 165–176. Huang, F., Liu, D., Li, X., Wang, L., Xu, W., 2011. Preliminary study of a clusterbased open-source parallel GIS based on the GRASS GIS. International Journal of Digital Earth 4 (5), 402–420. Kandemir, M., Choudhary, A., Banerjee, P., Ramanujam, J., Shenoy, N., 2000. Minimizing data and synchronization costs in one-way communication. IEEE Transactions on Parallel and Distributed Systems 11 (12), 1232–1251. Mower, J.E., 1994. Data-parallel procedures for drainage basin analysis. Computers & Geosciences 20 (9), 1365–1378. Mills, K., Fox, G., Heimbach, R., 1992. Implementing an intervisibility analysis model on a parallel computing system. Computers & Geosciences 18 (8), 1047–1054. Mineter, M.J., Dowers, S., 1999. Parallel processing for geographical applications: a layered approach. Journal of Geographical Systems 1, 61–74. Mineter, M., Dowers, S., Caldwell, D., Gittings, B., 2003. High-Throughput Computing to Enhance Intervisibility Analysis. In: Proceedings of the 7th International Conference of GeoComputation. September 2003. MPICH2. /http://www.mcs.anl.gov/research/projects/mpich2/S. Rokos, D.-K., Armstrong, M.P., 1992. Parallel Terrain Feature Extraction. Proceedings of GIS/LIS’92. San Jose, California, November 10–12, vol. 2, 652–661 pp. Riegler,G., Hoeppner, E., Li, X., 2006. Automatic Contour Line Generation Using Intermap Digital Terrain Model. ASPRS 2006 Annual Conference. May 2006, 1–5 pp. Schmieder, A., Huber, R., 2000. Automatic Generation of Contour Lines for Topographic Maps by Means of Airborne High-Resolution Interferometric Radar Data. In: Proceedings of ASPRS Annual Conference. Washington, DC, May 2000.
28
J. Xie / Computers & Geosciences 49 (2012) 21–28
Sorokine, A., 2007. Implementation of a parallel high-performance visualization technique in GRASS GIS. Computers & Geosciences 33 (5), 685–695. Stricker, T., Stichnoth, J., O’Hallaron, D., Hinrichs, S., Gross, T., 1995. Decoupling Synchronization and Data Transfer in Message Passing Systems of Parallel Computers. In: Proceeding ICS 95 Proceedings of the 9th International Conference on Supercomputing. 1–10 pp.
Stricker, T., Gross, T., 1995. Optimizing Memory System Performance for Communication in Parallel Computers. In: Proceeding ISCA 95 Proceedings of the 22nd Annual International Symposium on Computer Architecture. 308–319 pp. Xavier, C., Iyengar, S.S., 1998. Parallel Algorithms, 1998. John Wiley & Sons.