ARTICLE IN PRESS Journal of Visual Languages & Computing
Journal of Visual Languages and Computing 16 (2005) 455–479
www.elsevier.com/locate/jvlc
Distributed visibility culling technique for complex scene rendering Tainchi Lu, Chenghe Chang Department of Computer Science and Information Engineering, National Chiayi University, No. 300, University Rd., Chiayi 600, Taiwan Received 12 July 2004; received in revised form 3 December 2004; accepted 14 March 2005
Abstract This paper describes a complex scene rendering system that it can comprehensively render larger and more complex 3D scenes in a form of output-sensitive way by means of using distributed visibility culling technique. The process of the proposed visibility calculations is explicitly divided into two distinct phases, one is preprocessing stage, and the other is on-thefly stage. At the preprocessing stage, the whole scene is partitioned into numerous regions, namely spatial cells, by adopting BSP tree algorithm. Accordingly, the complexity weight of each cell is estimated in advance depending on the number of geometric polygons within the cell. Afterwards we find out possible occluders in each cell for accelerating the real-time occlusion culling at run time. Moreover, instant visibility is taken into account to quickly calculate the tight potentially visible set (PVS) which is valid for several frames during the onthe-fly phase. As dynamic load balancing algorithm is concerned, we employ the cell arrangement mechanism to dynamically assign a specific amount of service demand to each calculating machine. The amount of service demand is estimated when a calculating machine is dynamically inserted into or removed from the distributed calculating cluster. Finally, after the drawing machines gather the PVS results from every calculating machine, they render the scene for users to view it on the next frames. From the simulation results, we can see that the proposed real-time walkthrough environment takes good advantage of the distributed
Corresponding author. Tel.: +11 886 5 271 7730; fax: +11 886 5 271 7741.
E-mail address:
[email protected] (T. Lu). 1045-926X/$ - see front matter r 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.jvlc.2005.03.002
ARTICLE IN PRESS 456
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
visibility culling technique for displaying large, complex 3D scenes in real time and gets rid of a troublesome computation delay problem. r 2005 Elsevier Ltd. All rights reserved. Keywords: Spatial subdivision; Occlusion culling; Load balancing; Distributed computing; Walkthrough system
1. Introduction Over the past decades, geometry-based rendering has been rapidly grown to satisfy with the requirement of real-time computation in drawing large-scale complex scenes. More and more researchers who are interested in the area of interactive computer graphics still pay big attention to develop an efficient visibility culling technique for interactive walkthrough applications or multi-user virtual environment [1]. Even though the modern hardware of computer graphics (e.g. 3D graphics card) has been made a great progression recently, it could not meet with the essence of output-sensitive by merely relying on the graphics hardware acceleration. Besides, high visual realism, critical real-time rendering, and tight human–computer interaction are the other fundamental components for building a fast walkthrough system. Consequently, a variety of visibility-culling algorithms were developed in the 1990s to address the basic problem of removing invisible polygons of scene primitives. In principle, the idea of visibility culling is to decrease the number of geometric polygons by means of rejecting actual invisible geometries, which are opposite to a single viewpoint of a viewer. By constructing conservative estimates of the visible set, we can define a potentially visible set (PVS), which consists of all the visible geometries with parts of occluded objects for the region of space. In the case of the visibility culling algorithms, they can be roughly classified into three categories, including view-frustum culling [2–4], back-facing culling [5,6], and occlusion culling [7–14]. In addition to the visibility culling, hidden-surface-removal technique [15,16], level-of-detail approach [17–22], and image-based rendering [23–27] are regarded to as other good approaches to accelerating the scene rendering as well. Although the aforementioned solutions have been presented to improve the rendering efficiency, they are not sufficient for the needs of output sensitivities toward any large-scale and complex scene. A complex scene consists of numerous geometric objects and polygons that are widely distributed over the region of space. Most walkthrough systems [13,14] directly employ the spatial subdivision technique to partition the scene into a number of regions, namely spatial cells. Subsequently, the cell-and-portal fromregion visibility determination can be individually performed for each spatial cell. However, in case of a minor geometric modification has been made to the scene, they need to carry out the visibility calculations once more. Moreover, these visibility calculations are used to being accomplished by a single rendering machine without any load balancing functionality. In contrast to the conventional manner that is mentioned above, we make good use of distributed computing with instant visibility
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
457
[28] to accelerate the on-line visibility culling calculations in a conservative visibility manner. Similarly, we take advantage of binary space partition (BSP) tree algorithm to subdivide a complex scene into many spatial cells. The cells are delivered to the distributed computing cluster by relying on their complexity of geometry and computing powers of calculating machines. Note that in case a running machine is crashed or shut down, the system needs to keep on working with no influence on the rendering results for observers. The operation of the fast walkthrough system is divided into two stages, namely preprocessing stage and on-the-fly stage. The purpose of the preprocessing is to accelerate the rendering process and reduce drawing overheads at the run time. As for the on-the-fly stage, parts of visible polygons are selected from a potentially visibility set by means of referring to the current viewpoint of an observer, and the selected PVS data are subsequently propagated to the drawing process for rendering. In addition, the architecture of the system comprises three kinds of machines with different functionalities. The details will be described in Section 3. The remainder of the paper is organized as follows. Section 2 briefly gives taxonomy of visibility culling techniques, including point-based and from-region visibilities, and distributed computing. We propose the architecture of fast walkthrough system in Section 3, and present the simulation results in Section 4. Finally, conclusions are summarized and future work is addressed in the last section.
2. Brief overviews 2.1. Visibility culling Many on-line walkthrough systems take good advantages of visibility culling to eliminate invisible geometric objects in order to accelerate the processes of rendering and drawing. Nowadays, quite a few visibility algorithms are able to achieve the objective with the support of graphics hardware. The following is a brief description of the visibility culling with respect to point-based and from-region visibilities. For more detailed survey of visibility for walkthrough applications, please refer to [29]. 2.1.1. Point-based visibility As for a point-based visibility method, it takes into account the current viewpoint of an observer in each frame, and finds out visible geometric objects within the current view frustum of the observer. The method aims to perform the visibility calculations in real time and accomplish the rendering of the geometric objects prior to the next frame. The major advantage is that it can precisely find out the possible visible objects with less time consumption at preprocessing stage. However, it used to sustain a low frame rate because it needs to complete the visibility calculations within a frame time and perform the same operation frame by frame according to the location of the current viewpoint. In the case of the point-based visibility algorithms, view-frustum culling [2–4], back-face culling [5,6], and occlusion culling [7–14] are three most popular methods
ARTICLE IN PRESS 458
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
Fig. 1. Three distinct visibility culling methods.
to implement in interactive walkthrough applications. Fig. 1 shows an illustration for the three kinds of visibility culling methods. These visibility techniques are applied to each polygon independently of the other polygons in the scene. In particular, view-frustum culling algorithm is a common visibility method implemented in most walkthrough applications. It avoids the unnecessary computations by simply rejecting the invisible polygons outside the current view frustum. Backface culling algorithm avoids rendering geometry that its faces are hidden behind the front visible polygons as well as facing away from the observer’s viewpoint. As for occlusion culling algorithm, it takes advantages of spatial interrelationship among geometric objects to determine which objects are occluded by some other part of the scene. Therefore, it is more complicated than the view-frustum and back-face culling methods.
2.1.2. From-region visibility From-region visibility method invokes high spatial coherence to calculate the maximum visible polygons, which can be explicitly viewed from any possible viewpoint of an observer in the scope of a specific spatial region (i.e. cell). The method aggregates these visible polygons into a tight PVS, and the PVS is valid for a number of frames when the observer still stays at the same region. Thus, visibility computation can be quickly done by selecting the required parts of visible polygons from the PVS. When the observer moves to another adjacent region, the PVS should be defined again with respect to a new region of space. In comparison with the pointbased visibility, most from-region visibility algorithms require significant computing time and storage costs in preprocessing phase but they need not carry out the on-line visibility computation towards every viewpoint for each frame. As far as the from-region visibility algorithms are concerned, they cope with the potentially visible sets from a region of space for the purpose of the rendering acceleration. We now review three contemporary from-region visibility methods,
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
459
which are cell-and-portal from-region visibility, extended projections, and virtual occluders. Concerning the cell-and-portal from-region visibility, Teller and Se´quin [30] aimed to handle with advanced occlusion culling with the concept of conservative visibility by spatially subdividing a scene model into convex cells. The interior cells are mutually linked by portals (e.g. windows and gates). When an observer stayed at any one cell to view the scene, he/she can view not only the visible objects within the current cell but other visible portions of cells through a series of portals. Consequently, the other invisible objects occluded by opaque materials (e.g. walls) are rejected to substantially reduce rendering loads. Besides, Wonka et al. present a conservative approximation of the visible region, namely instant visibility [28], to overestimate the PVS data by taking into account the observer’s maximum movement speed vmax , rotation angle j, and walking distance during the time t . By means of shrinking an occluder with units for two sides (i.e. occluder shrinking [14]), the instant visibility approach provides a smaller umbra that is insufficient to occlude the objects located outside the changed umbra. An illustration is given in Fig. 2. Durand et al. [31] propose a point-based image-precision method, called extended projections, to compute PVS through a view cell at preprocessing stage. First, the method picks up several large-size convex objects which are close to the view cell as occluders, and the geometric objects hidden behind the occluders are regarded as occludees. From all the possible points within the view cell, occluders and occludees are projected onto a projection plane, which is lay behind the occluders. The principle of extended projections is similar to hierarchical occlusion maps [32] and Zbuffer [33] and the illustration is shown in Fig. 3(a). As long as the projections of the occludees are fully covered by the projections of the occluders, it means that the occludees are completely invisible from any viewpoint in the view cell and can be avoided rendering. If numerous occluders are projected onto one projection plane with intersections, the extended projections method can carry out occluder fusion by aggregating these intersected projections into a larger projection area. By means of the operation of the occlusion fusion, more occludees can be avoided rendering. Fig. 3(b) shows an illustration of the occluder fusion.
Fig. 2. Instant visibility: (a) the objects turn out to be visible after the viewpoint is moved sideward by units; (b) by shrinking the occluder with units for two sides.
ARTICLE IN PRESS 460
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
Fig. 3. Extended projections: (a) principle of extended projections; (b) occluder fusion.
Koltun et al. [34] propose the notation of from-region virtual occluder to facilitate the occlusion fusion. From a given view cell, virtual occluders represent the accumulated occlusion by aggregating the visibility of a set of individual occluders. The method arbitrarily picks up several objects as seed objects and takes each seed object as a pivot to identify its surrounding objects in an iterative manner. If the objects satisfy a particular geometric criterion, they are added to a cluster of occluders as well as growing the occluder. As a result, more occlusion effects can be achieved by the from-region virtual occluder. 2.2. Distributed computing Both point-based and from-region visibility techniques require extra amount of preprocessing time and storage to perform visibility computation. Therefore, a sophisticated acceleration of the visibility computation becomes a significant research issue for developing a fast walkthrough application nowadays. One of the practical solutions is to make good use of distributed computing with dynamic load balancing. Concerning the load balancing, it must balance the workloads of each machine at any time and be capable of transferring unfinished jobs or data from a heavy-load machine to another idle machine, which is with small loads. According to the taxonomy of distributed computing systems presented by Casavant and Kuhl [35], static load balancing algorithm must accomplish either data or job partitions in the preprocessing phase and cannot alter the partition results at run time. In contrast, dynamic algorithm takes the current workload condition of running machines into account when it assigns new jobs to the machines. Hence, most distributed systems adopt dynamic load balancing algorithms instead of using static methods. This is because the static methods are for lack of flexibility and scalability. In [36–40], the dynamic algorithms adopt three policies, one is location policy, another is information policy, and the other is transfer policy, to distribute the jobs among the machines. In other words, they need to dynamically determine where to do the jobs, how to estimate the workloads, and when to transfer the tasks. Traditionally, the location policy is based on two kinds of models, namely sender-
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
461
initiated and receiver-initiated migrations. In sender-initiated migration, the migration is initiated at the machine where the data or jobs currently reside or are being executed. As for receiver-initiated migration, the migration is manipulated by the target machine. If both the migrations are implemented, the migration is a type of symmetrically initiated migration. Furthermore, most transfer policies figure out the workloads of machines by taking advantage of threshold values. However, a single threshold value probably incurs a waggling state problem due to the fickleness of the workloads. To date, some sophisticated dynamic load balancing algorithms [41–43] can cope with the waggling state problem.
3. The architecture of fast walkthrough system 3.1. System overview Fig. 4 demonstrates the architecture of the walkthrough environment. The system architecture comprises three kinds of machines with different functionalities. The description of each kind of machine is given in the following: (a) Management agent (MA): MA is responsible for gathering workload information from calculating machines (CM), and it distributes spatial cells to each CM according to its current workload. Moreover, it dynamically manipulates the calculating and drawing machines to join in or leave from the walkthrough environment. In particular, we apply the cell assignment mechanism, which will be presented in the next section, to the MA in order to keep the workload balance among the CMs.
Fig. 4. The walkthrough system architecture.
ARTICLE IN PRESS 462
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
(b) Calculation machine (CM): CM calculates a set of potentially visible objects in the specified spatial cells which are dynamically allocated by the MA. Relying on the CM computing cluster, we can speed up visibility computation by taking advantages of distributed computing. Moreover, the CM adopts an online occlusion culling method, namely instant visibility (see [28] for more details), to accomplish on-line visibility for a specific region around the current viewpoint, and it transmits the visibility result to a drawing machine for rendering on the next frame. (c) Drawing machine (DM): We split the visibility pipeline into visibility computation and drawing process. The visibility computation means that it determines which geometric objects or polygons are visible from an observer’s position; then the visible parts are drawn in the drawing process. Thus, DM is merely responsible for the drawing process but without executing the visibility computation. As a result, the more drawing machines are joined in the system, the more observers can be served by the walkthrough system.
3.2. Cell assignment mechanism In this subsection, we refer to [41] to elaborate the cell assignment mechanism for allocating cells to the CMs. The mechanism takes two major factors into account to perform the cell assignment. The first one factor is computing efficiency of CM, and the other is the spatial relationship among the adjacent cells. In the following paragraphs we first define an underlying model to evaluate a heterogeneous distributed system and then propose the algorithm to achieve the cell assignment. 3.2.1. Model A distributed system comprises different kinds of machines with distinct hardware devices and software programs. It is heuristic to realize that a same task executed by different machines would not exactly have the same processing time. The system needs to estimate each machine’s computing power and assign appropriate jobs to each machine in terms of the global state of the distributed environment. The model of cell assignment mechanism is defined as follows. (a) Type of machine: To suppose that there are total m types of machines T i , where 1pipm, run in the heterogeneous multi-computer system. Moreover, the number of the identical machine type T i is ci , and the average throughput of these machines is uT i . Consequently, the constitution of the distributed system can be represented as the following set with m entities: fðT 1 ; P uT 1 ; c1 Þ; ðT 2 ; uT 2 ; c2 Þ; :::; ðT m ; uT m ; cm Þg. The total number of machines is C¼ m i¼1 ci . Moreover, each machine has a unique ID number Pj for recognition of the host. We can use a set ðP1 ; P2 ; :::; Pc1 Þ to represent the cluster of the machines, which is classified as the type T 1 , ðPc1þ1 ; Pc1þ2 ; :::; Pc1 þc2 Þ for the machine type T 2 , and so forth. Subsequently, TðPj Þ is used to denote the specific type of the machine Pj here.
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
463
(b) Type of task: To assume that there are total n types of tasks C i , where 1pipn, will be accomplished in the heterogeneous system. Each task can be classified into different types by evaluating its service demand wi , code length l i , and arrival rate li . The service demand means how many system resources are explicitly required to execute a task; then the code length is the total length of a message to migrate a task from its original machine to the remote target machine for later execution. Consequently, the types of tasks can be represented as the following set with n entities: fðC 1 ; w1 ; l 1 ; l1 Þ; ðC 2 ; w2 ; l 2 ; l2 Þ; :::; ðC n ; wn ; l n ; ln Þg. Afterward, CðjÞ is used to denote the type of the task j here. (c) Workload evaluation: To suppose that the workload of the machine Pi is W i , where W i is the total execution time to accomplish the tasks assigned to the machine Pi . Furthermore, the workload can be evaluated by adding up the service demands of tasks in the machine Pi and dividing the sum by Pi ’s throughput: 1 X Wi ¼ wCðjÞ , (1) mTðpi Þ j2K i
where K i is the set to aggregate the total types of tasks assigned to the machine Pi .
3.2.2. Algorithm As shown in Fig. 5, the major purpose of the algorithm is to get rid of the unbalanced task distributions among the CMs. In other words, we must guarantee that the CMs need to simultaneously produce their visibility results for rendering in time on the next frame. Concerning data communication between the calculating and drawing machines, the DM first sends the current position of an observer to the CMs, and the CMs accomplish the visibility calculations and send the visibility results back to the DM. Based on the above description, visibility time tvis can be represented as tvis ¼ tcal þ 2tcomm ,
(2)
where tvis is the waiting time in receiving the next visibility results from the CMs, tcal is the CMs’ calculation time, and tcomm is the network communication time. Note that if the system is built on a reliable high-speed network environment (e.g. 100 Mbps fast Ethernet), tcomm is approximating to zero, and we obtain tvis ffi tcal . Hence, the network communication time is disregarded in the proposed walkthrough system for simplicity. 3.3. Preprocessing stage 3.3.1. Scene subdivision into cells Space-partitioning representations are used to describe interior properties by partitioning the spatial region containing objects into a set of small, nonoverlapping, contiguous solids or cubes. A common space-partitioning description
ARTICLE IN PRESS 464
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
Arrange_Cells (cells C, CMs M) Calculate the estimated weight ei for each CM for each cell Cj in C for each CM Mi in M if each adjacent cell of Cj is not existed in Mi and the current assigned weight wi of Mi is less than ei then Allot Cj to Mi
wi = wi + complexity weight wj of Cj else
reject_count = reject_count + 1 if reject_count is equal to the number of CMs then Find out the CM Mmin with the lowest weight ratio (wmin / emin) Allot Cj to Mmin
wmin = wmin + complexity weight wj of Cj reject_count = 0 Fig. 5. The algorithm of the cell assignment mechanism.
for three-dimensional objects is a binary space partitioning tree, or BSP tree [42]. BSP tree is a hierarchical tree structure and is used to represent solid objects in some graphics systems. It partitions space into two parts at each level by using a splitting plane. Each non-terminal node in the BSP tree represents a single partitioning plane that divides occupied space into two. A terminal node represents a region that is not further subdivided and would contain pointers to data structure representations of the objects intersecting that region. The tree structure is organized so that each terminal node corresponds to a region of three-dimensional space. This representation for solids takes advantage of spatial coherence to reduce storage requirements for three-dimensional objects. It also provides a convenient representation for storing information about object interiors. Fig. 6 demonstrates the idea of BSP tree representation in two dimensions and a subdivision of a twodimensional space. 3.3.2. Creation of cell adjacency graph After the operation of a scene subdivision, we aim to concatenate on neighboring cells with each other to construct a cell adjacency graph [43]. By means of traversing the cell adjacency graph from an arbitrary cell, it can be easily to find out its adjacent cells. In particular, at the preprocessing stage, the MA assigns the cells to the CM by means of employing the cell assignment mechanism with the cell adjacency graph. The cell adjacency graph corresponding to Fig. 6(a) is shown in Fig. 7.
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
465
Fig. 6. All illustration to partition a 2D scene into cells by BSP algorithm: (a) 2D sample scene; (b) spatial cells; (c) the corresponding BSP tree.
Fig. 7. The cell adjacency graph.
3.3.3. Calculation of cell complexity Suppose that a complex scene has been subdivided into n cells by means of the BSP algorithm. The complexity weight wi of the cell C i , where 1pipn, is estimated according to the number of geometric polygons in the cell C i . As long as the complexity weight of the cell is large, it requires more time to calculate the PVS and do rendering. Consequently, the system must achieve the load balancing at the onthe-fly stage by assigning a cell with low complexity weight to the CM with low throughput. 3.3.4. Occluder selection using extended projections Even though a large and complex scene has been broken down into cells to quickly prune large portions of the scene, the number of geometric polygons within a cell is still considerably large. All hidden parts behind the opaque objects need to be rejected from the view cells by using extended projections, and then we store the PVS results on disk in order to retrieve them on demand during an interactive walkthrough. As shown in Fig. 8, for an arbitrary view cell, occluders and occludees
ARTICLE IN PRESS 466
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
Fig. 8. Using extended projections to select occluders.
are all projected onto a projection plane and occludee is declared as a hidden object if its projection is entirely covered by the cumulative projection of occluders. A welldefined projection area is defined in advance to regard as a threshold of being an occluder. When the extended projection of a polygon is bigger than the area, then we treat the polygon as an occluder and add it into the occluders set. The selection operation keeps on working until all occluders in the scene are found out. In addition to the process of occluder selection, the occluder shrinking must be carried out to the occluders set by means of shrinking each occluder with the adequate value of . The purpose is to calculate a visibility solution that it will be valid for several frames at the on-the-fly stage. 3.4. On-the-fly stage 3.4.1. Add calculating and drawing machines In the period of the on-the-fly stage, the calculating and drawing machines (i.e. CM and DM) can be dynamically added into the system to increase the whole computing power and support more users to walk through 3D scenes with different viewpoints. After a new CM joins the computing pool, the system needs to reassign the spatial cells to each CM concerning its current workload. The seven steps to increase a new CM in the system are described as follows, and the diagram is shown in Fig. 9(a). (1) A new CM connects to the management agent (MA) for obtaining the up-to-date system information. (2) The MA accepts the connection request from the CM and keeps the binding information of the CM in the CM table. In the meanwhile it propagates the information to the DMs and sends all DMs information back to the new CM. (3) The MA prepares to reassign the cells to each CM by employing the cell assignment mechanism. When the new CM and DMs receive the updated
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
467
Fig. 9. The flow diagrams to add a new CM and DM: (a) add a new CM; (b) add a new DM.
(4)
(5)
(6)
(7)
information from the MA, they need to update their table for the sake of associating with each other in the next step. After the cell reassignment is accomplished, each CM will be responsible for calculating towards its new cells instead of the old ones. Moreover, the MA sends complete cells information to the DMs to notify them which cells are calculated by the specific CMs. The new CM and DMs update their cell table. However, the old CMs can not immediately update their cell table but keep the cell information temporarily. This is because the new CM does not start working at the moment and the old CMs need to use stale cell information to keep on computing for the drawing of DMs. As the new CM has been initialized successfully, the MA sends an acknowledgement to the DMs and the other CMs to announce that the new computing service has been available. So far, the old CMs update their cell table with the new cell information and carry out the visibility calculation in terms of the dynamic load balancing.
ARTICLE IN PRESS 468
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
In addition to adding a new CM, the system permits to add a new DM for supporting more users in observing scenes with different viewpoints. First, the new DM connects to the MA as well as an ordinary client for gaining the information of CMs. The MA updates the DM table and propagates the information to the CMs. As long as the DM has the connection with the CMs, it can demand its visibility calculations from the CMs as well as an existing DM. Fig. 9(b) demonstrates the step diagram of adding a new DM.
3.4.2. Calculation and rendering Even though a densely complex scene has been partitioned into spatial cells by the BSP algorithm at the preprocessing stage, the number of cells is also considerably large. If a single machine is not only responsible for the visibility calculation but also handle with the rendering process, it is obvious that the user would wait for the display instead of walking through the scene in a real-time manner. The solution is to divide the execution into calculation and rendering operations and assign the jobs to different hosts. In other words, CM is responsible for calculating the PVS data and
Fig. 10. The interoperability between the CM and DM.
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
469
DM solely displays the visibility results. The interoperability between the CM and DM is illustrated in Fig. 10, and the crucial steps are described as below: (1) An observer can move his view position to observe the scene while he is walking through. Meanwhile the DM promptly updates the view position of the observer and sends the view position to the CM cluster. (2) After each CM receives the view position from the DM, it employs the instant visibility approach to performing the conservation visibility culling towards its designated cells. The process of the visibility culling consists of the view-frustum culling, back-face culling, and occlusion culling to remove the invisible objects against the current view position. (3) The DM renders the scene after it collects the final PVS results from each CM. Furthermore, Fig. 11 shows the pseudo codes corresponding to the PVS calculation of the CM, and the functions are listed in Table 1. 3.4.3. Remove calculating and drawing machines Concerning fault tolerance of the walkthrough system, the CMs and DMs could be crashed once in a while or manually removed from the cluster by the system administrator. That is, perhaps the system collapses entirely for lack of the fault tolerance functionality. In contrast with the operation of adding the CM and DM, the major difference is that the present DM must take over the remaining PVS from the removed CM to perform the rest visibility calculations. Therefore, we prevent the
Calculate_PVS (view position VP, assigned cells C) virtual frustum VF' = Adjust_View_Frustum (VP) located cell L = Find_Leaf (VP) occluders D = Find_Occluders (L) for each occluder Di of D if In_Frustum (Di , VF' ) = FALSE then remove Di from D for each cell Ci of C if In_Frustum (Ci , VF' ) = TRUE then if Is_Occluded (Ci , D) = FALSE then for each face F of Ci if Is_Occluded (F, D) = FALSE then add F to PVS Fig. 11. The pseudo codes of the CM to calculate the PVS data.
ARTICLE IN PRESS 470
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
Table 1 The description of the function Function Name
Description
Adjust_View_Frustum
Define the maximum region of the view frustum by translating units and rotating j degrees in the period of t . Search for the located cell in terms of the present viewpoint. Define the set of occluders within a cell. To determine the object that is visible or not by the view-frustum culling. To determine the object that is occluded or not by the occlusion culling.
Find_Leaf Find_Occluders In_Frustum Is_Occluded
Fig. 12. The flow diagrams to remove a CM and a DM: (a) remove a CM; (b) remove a DM.
system from being dead. Except for above manipulation, the other operations are similar to the steps in adding a CM into the environment, and the step diagram is shown in Fig. 12(a). The operation of removing an existing DM is analogous to adding a new DM in the system. It can be regarded as a client leaving from the system; then the system
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
471
just deletes all information regarding the DM. Fig. 12(b) demonstrates the step diagram of removing a DM.
4. Simulation results 4.1. Results at the preprocessing stage We have implemented and tested distributed visibility culling in a number of personal computers with 2.0 GHz Intel Pentium 4 CPU and GeForce4 Ti 4200 graphics board under OpenGL. The walkthrough system makes good use of a testing densely scene with the Quake III format which is free available through the Web site of id Software [44]. Moreover, software developers can simply take advantage of Q3Radiant editor to create scene maps by themselves and export the scene maps into the BSP-formatted files. The volume size of the testing scene is 1230 720 1780 in cubic units and it has been subdivided into 2233 spatial cells with total 980,401 polygons. Fig. 13 shows the screen snapshots captured from the walkthrough system. Concerning a complex scene, it was used to spend a significant amount of time to calculate a set of potentially visibility objects for each viewpoint during the preprocessing phase. In particular, a slight modification of the scene needs to pay extra time to carry out the similar visibility calculations. Therefore, how to reduce the preprocessing time is an important issue for a fast walkthrough system over the last few years. First, we make use of BSP tree to subdivide a scene into a number of spatial cells, and take into account that the visibility culling is relied on the spatial cells. The cell assignment mechanism is adopted to distribute the spatial cells to the CM computing cluster for further visibility culling calculation. By means of using the
Fig. 13. Screen snapshots of the walkthrough system: (a) a bird-eye view of the scene; (b) part of the scene.
ARTICLE IN PRESS 472
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
Table 2 The preprocessing time (s) with the different number of calculating machines Number of objects
Number of CM
100 200 300 400 500
1
2
3
4
5
445.881 526.137 604.296 693.606 768.942
232.349 275.350 315.985 357.563 395.431
154.208 182.749 211.421 238.339 263.658
115.358 137.502 157.863 178.753 197.115
94.758 111.362 126.311 144.300 159.496
6
No. of object = 100 No. of object = 200 No. of object = 300 No. of object = 400 No. of object = 500 Linear Speedup
5
Speedup
4
3
2
1
0 1
2
3
4
5
No. of CM
Fig. 14. The calculating speedup has linear progression as the number of CM is increased from one to five.
high-efficient distributed computing cluster with dynamic load balancing to speed up the calculation, the total quantity of PVS computation can be adaptively scattered over the calculating machines. Finally, the calculation results are aggregated into the PVS for the sake of carrying out the occlusion culling at runtime. As shown in Table 2, five CMs are increasingly added into the distributed computing cluster to perform the visibility calculations in which the number of objects is increased from 100 to 500. From Table 2, we can see that the more CMs are added into the cluster, the less calculation time can be got at the preprocessing stage. Furthermore, we also achieve a high calculating speedup in terms of the different number of objects, and the diagram of speedup is shown in Fig. 14. 4.2. Results at the on-the-fly stage In this experiment, we compare the view-frustum culling technique with the run-time occlusion culling method by means of estimating their frame
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
473
350 view-frustum culling run-time occlusion culling
frame time (ms)
300 250 200 150 100
1 50 0
1
101 201 301 401 501 601 701 801 901 frame number
Fig. 15. The frame time in comparison with the view-frustum culling and the run-time occlusion culling.
time, visibility time, draw time, number of objects, and number of polygons. In principle, the frame time is the summation of the visibility time and the draw time. Fig. 15 shows the result of the frame time in comparison with the view-frustum culling and the run-time occlusion culling. From the simulation results we can see that these two techniques cannot hit the underlying requirement of real-time visibility calculation at all. This is because the frame times of the techniques are not smooth and with very big jitters. Therefore, the observers could either view an incorrect scene result or wait for the scene display in a while. First concerning with the view-frustum culling, it cannot exclude the invisible occludees, which are located inside the view frustum, to avoid the needless rendering process. If the number of the invisible occludees is very huge, the phenomenon is that the total rendering time is more than the visibility time. In other words, it turns out that the draw time is almost equal to the frame time. The results of the view-frustum culling are shown in Fig. 16(a). As for the run-time occlusion culling, it spends a lot of time to perform the exact visibility matching between the occluders and occludees, and removes many invisible objects which are occluded by the occluders. That is, the visibility time occupies a big portion of the frame time. On the other hand, the average of the draw time can be possessed below 16 ms, and it satisfies with the basic requirement of real-time rendering. The results of the run-time occlusion culling are shown in Fig. 16(b). Although the view-frustum culling explicitly decreases the number of invisible objects and polygons, it cannot lead to the essence of output-sensitive as well as the display efficiency is fully depended on the number of objects located within the view frustum. The results indicate that the maximum number of the visible objects is 459 and minimum number is 25. In contrast, the run-time occlusion culling removes almost invisible objects and polygons,
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
474
and it turns out that the number of visible objects is between 0 and 137. Fig. 17 shows the results of the visible objects and polygons in comparison with the view-frustum culling and run-time occlusion culling. From the above experimental results, we conclude that a fast walkthrough system without dynamic scene needs to take good advantage of the cell-and-portal properties to calculate the PVS data by the view-frustum culling at the preprocessing stage. Obviously, the PVS calculation is a time-consuming task and not suitable for executing at run-time level. At the on-the-fly stage, we employ the on-line occlusion culling (i.e. instant visibility) and the cell assignment mechanism to obtain a low display time. In particular, the visibility and rendering processes are separately carried out in the CM and DM pools.
300
350
frame time visibility time draw time
250
frame time visibility time draw time
300 250
time (ms)
time (ms)
200
150
200 150
100
100 50
50 0
0 1
(a)
101 201 301 401 501 601 701 801 901
1
101 201 301 401 501 601 701 801 901
(b)
frame number
frame number
Fig. 16. The frame, visibility, and draw times by means of employing the view-frustum culling and runtime occlusion culling: (a) the view-frustum culling; (b) the run-time occlusion culling.
1000000
500 450
view frustum culling
900000
view frustum culling
400
run-time occlusion culling
800000
run-time occlusion culling
visible polygons
visible objects
350 300 250 200 150
600000 500000 400000 300000
100
200000
50
100000 0
0 1
(a)
700000
1
101 201 301 401 501 601 701 801 901
frame number
(b)
101 201 301 401 501 601 701 801 901
frame number
Fig. 17. The comparison between view-frustum culling and run-time occlusion culling: (a) the number of visible objects; (b) the number of visible polygons.
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479 250
60
50
200
40 visibility time (ms)
frame time (ms)
475
30
20
150
100
50
10
0
0 1
501
1001
(a)
1501
2001
4
250
843
(b)
frame number 70
1496
2100
2612
frame number 200 180
60 160 visibility time (ms)
frame time (ms)
50 40 30 20
140 120 100 80 60 40
10
20 0
0 1
(c)
301
601
901
1201
1501
frame number
4
1801
(d)
442
843
1196
1496
1800
frame number
Fig. 18. To dynamically add and remove CMs or DMs at the on-the-fly stage: (a) the frame time is stable as adding new CMs; (b) the visibility time can be decreased as adding new CMs; (c) the frame time has a little jitter as removing the existed CMs; (d) the visibility time is increased as removing the existed CMs.
4.3. Dynamic cluster computing In the case of dynamic cluster computing, both load balancing and fault tolerance are two crucial techniques that can improve execution performance and running stability, respectively. In principle, the more computers that have been aggregated to work collaboratively, the less execution time we need to expend. In this experiment, we first use one DM to render a scene by a fixed viewpoint, and then increasingly add a number of CMs in order to keep trace of the drawing results in the DM. Fig. 18(a) shows the frame time as adding new CMs in the cluster. In the beginning, the first CM has been joined in the cluster to be responsible for calculating the PVS data. Subsequently, the second, third, and fourth CM have been inserted into the environment when the frame number is at 1018, 1694, and 2409, respectively. The results reveal that the frame time of the DM is still stable and smooth, and the
ARTICLE IN PRESS 476
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
dynamic adding operation has only a slight influence on the DMs drawing process. In contrast to the frame time, the visibility time will be decreased when more and more CMs are joined in the cluster to cooperate with each other. The simulation results are given in Fig. 18(b). Nevertheless, when the frame number is at 1018, 1694, and 2409, the visibility time bursts out increasing. This is due to the computing latency between the old CMs and the new joining CM when the MA reassigns cells to each CM. As for dynamically removing a running CM from the cluster, the operation is similar with the procedure described above. First there are four CMs that have been joined in the cluster. Subsequently, the first, second, and third CM has been removed from the system when the frame number is at 509, 1123, and 1596, respectively. The phenomena are analogous to adding new CMs in the cluster. The visibility time will be increased when more and more CMs are deleting from the cluster. Fig. 18(c) and (d) shows the simulation results as dynamically removing existing CMs at the on-the-fly stage.
5. Discussion Various methods have been proposed to accomplish the visibility culling in a form of conservative or optimistic mode. We can roughly divide these methods into two categories: point-based and from-region visibility methods. Point-based methods are subdivided into object and image-precision techniques, and from-region approaches take advantage of the cell-and-portal structure of architectural environments. Some detail descriptions about visibility have been addressed in Section 2. In particular, Daniel et al. [29] have surveyed a wide variety of visibility algorithms for walkthrough and related applications. Concerning the proposed distributed visibility culling technique, it integrates the view-frustum, back-face, and run-time occlusion cullings with the dynamic distributed computing to develop a fast walkthrough system. The simulation results have shown that the system can not only gains low frame time for rendering larger and more complex 3D scenes but also achieve the dynamic cluster computing. Table 3 compares some related visibility methods with our technique according to the taxonomy presented in [29].
6. Conclusions and future work We have presented a distributed visibility culling technique that is an extension of the classical visibility culling algorithms for various walkthrough applications. In terms of distributed computing, the preprocessing time of calculating a PVS has been considerably reduced at the preprocessing stage. Likewise, the run-time occlusion culling is capable of being done well with high frame rate in the on-the-fly phase. With respect to load balancing and fault tolerance, we have proposed the cell arrangement mechanism to dynamically distribute the spatial cells among the calculating machines depending on their current workload and computing power. In case of any calculating machine is crashed or removed from the computing cluster,
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
477
Table 3 The proposed technique is in comparison with the other methods Method
2D/ 3D
Conservative Occluders
Preprocessing Hardware
The distributed visibility culling technique From-region cell-portals visibility Generic fromregion visibility Objectprecision point-based visibility Imageprecision point-based visibility
3D
Yes
All and portals
PVS by distributed computing
2D/ 3D
Depending on methods
Portals
PVS
2D/ 3D
Yes
3D
Depending on methods
All or large PVS or subset virtual occluders Cell All, portals, or connectivity large or occluders selection convex All or large None or subset occluders
2.5D/ Depending 3D on methods
Cluster computing
Yes, by Z- Yes, with buffer dynamic load balance and fault tolerance None None
Multiple users Yes, with multiple view positions None
Depending None on methods None None
None
Yes
None
None
None
the machine’s rest tasks can be taken over promptly by the specific drawing machine. Subsequently, the system redistributes the spatial cells to the working calculating machines in order to display the next frame smoothly. Moreover, the online visibility algorithm, namely instant visibility, combines the advantages of from-point and from-region visibility and produces a PVS that remains valid for a sufficiently large region of space. We have implemented the instant visibility to the aforementioned distributed environment in a manner of conservative visibility and rendered large, complex 3D scenes in an output-sensitive way. In the near future, we will aim to extend the distributed walkthrough system to support dynamic scenes with quickly rendering fake soft shadows. If a scene is dynamic, it contains a great deal of moving objects, and then additional time is required to update the model to reflect these objects’ motions. Furthermore, it is more difficult to sustain a high frame rate in performing the on-line occlusion culling and generating more realistic soft shadows for dynamic objects.
Acknowledgments The authors thank editor-in-chief and the anonymous reviewers for their comments. This work is supported by National Science Council at Taiwan, under grant number NSC93-2213-E-415-006.
ARTICLE IN PRESS 478
T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
References [1] T.A. Funkhouser, RING: A client-server systems for multi-user virtual environments, Proceedings of SIGGRAPH Symposium on Interactive 3D Graphics: Computer Graphics (1995) 30–39. [2] U. Assarsson, T. Moller, Optimized view frustum culling algorithms for bounding boxes, Journal of Graphics Tools 5 (1) (2000) 9–22. [3] L. Bishop, D. Eberly, T. Whitted, M. Finch, M. Shantz, Designing a PC game engine, IEEE Computer Graphics and Applications 18 (1) (1998) 46–53. [4] J.H. Clark, Hierarchical geometric models for visible surface algorithms, Communications of the ACM 19 (10) (1976) 547–554. [5] S. Kumar, D. Manocha, W. Garret, M.C. Lin, Hierarchical back-face computation, Proceedings of EUROGRAPHICS’96 on Rendering Techniques (1996) 235–244. [6] H. Zhang, K.E. Hoff III, Fast backface culling using normal masks, Proceedings of 1997 Symposium on Interactive 3D Graphics (1997) 103–106. [7] C. Schaufler, J. Dorsey, X. Decoret, F.X. Sillion, Conservative volumetric visibility with occluder fusion, Proceedings of SIGGRAPH’00 on Computer Graphics (2000) 229–238. [8] S. Goorg, S. Teller, Temporally coherent conservative visibility, Proceedings of the 12th Symposium Computational Geometry (1996) 78–87. [9] S. Goorg, S. Teller, Real-time occlusion culling for models with large occluders, Proceedings of SIGGRAPH’97 on Interactive 3D Graphics (1997) 83–90. [10] J. Bittner, V. Havran, P. Slavik, Hierarchical visibility culling with occlusion trees, Proceedings of International’98 on Computer Graphics (1998) 207–219. [11] N. Chin, S. Feiner, Near real-time shadow generation using BSP trees, Proceedings of SIGGRAPH’89 on Computer Graphics 23 (1) (1989) 99–106. [12] D. Cohen-Or, G. Fibich, D. Halperin, E. Zadicario, Conservative visibility and strong occlusion for viewspace partitioning of densely occluded scenes, Proceedings of EUROGRAPHICS’98 on Computer Graphics 17 (3) (1998) 243–254. [13] P. Wonka, D. Schmalstieg, Occluder shadows for fast walkthroughs of urban environments, Proceedings of EUROGRAPHICS’99 on Computer Graphics 18 (3) (1999) 51–60. [14] P. Wonka, M. Wimmer, D. Schmalstieg, Visibility preprocessing with occluder fusion for urban walkthroughs, Proceedings of EUROGRAPHICS’00 on Rendering Techniques (2000) 71–82. [15] A. Watt, F. Policarpo, 3D GAMES—Real-time Rendering and Software Technology, vol. 1, Addison-Wesley, Reading, MA, 2001. [16] J.D. Foley, A. van Dam, S.K. Feiner, J.F. Hughes, Computer Graphics, Principles and Practice, second ed., Addison-Wesley, Reading, MA, 1990. [17] T.A. Funkhouser, C.H. Sequin, Adaptive display algorithm for interactive frame rates during visualization of complex virtual environments, Proceedings of SIGGRAPH’93 on Computer Graphics (1993) 247–254. [18] M. Garland, P.S. Heckbert, Surface simplification using quadric error metrics, Proceedings of SIGGRAPH’97 on Computer Graphics (1997) 209–216. [19] H. Hoppe, Progressive meshes, Proceedings of SIGGRAPH’96 on Computer Graphics (1996) 99–108. [20] P. Lindstrom, D. Koller, W. Ribarsky, L.F. Hodges, N. Faust, G.A. Turner, Real-time, continuous level of detail rendering of height fields, Proceedings of SIGGRAPH’96 on Computer Graphics (1996) 109–118. [21] P. Lindstrom, Out-of-core simplification of large polygonal models, Proceedings of SIGGRAPH’00 on Computer Graphics (2000) 259–262. [22] J. Rossignac, P. Borrel, Multi-resolution 3D approximations for rendering complex scenes, Proceedings of Conference on Geometric Modeling in Computer Graphics (1993) 455–465. [23] D. Aliaga, J. Cohen, A. Wilson, H. Zhang, C. Erikson, K. Hoff, T. Hudson, W. Stuerzlinger, E. Baker, R. Bastos, M. Whitton, F. Brooks, D. Manocha, A framework for the real-time walkthrough of massive models, Technical Report UNC TR# 98-103, University of North Carolina at Chapel Hill, 1998.
ARTICLE IN PRESS T. Lu, C. Chang / Journal of Visual Languages and Computing 16 (2005) 455–479
479
[24] G. Schaufler, W. Stu¨rzlinger, Three dimensional image cache for virtual reality, Proceedings of EUROGRAPHICS’96 (1996) 227–236. [25] G. Schaufler, Nailboards: A rendering primitive for image caching in dynamic scenes, Proceedings of EUROGRAPHICS’97 on Rendering Techniques (1997) 151–162. [26] J. Shade, D. Lischinski, D.H. Salesin, T. DeRose, J. Snyder, Hierarchical image caching for accelerated walkthroughs of complex environments, Proceedings of SIGGRAPH’96 on Computer Graphics (1996) 75–82. [27] J. Shade, S. Gortler, Li-wei He, R. Szeliski, Layered depth images, Proceedings of SIGGRAPH’98 on Computer Graphics (1998) 231–242. [28] P. Wonka, M. Wimmer, F.X. Sillion, Instant visibility, Computer Graphics Forum 20 (3) (2001) 411–421. [29] D. Cohen-Or, Y.L. Chrysanthou, C.T. Silva, F. Durand, A survey of visibility for walkthrough applications, IEEE Transactions on Visualization and Computer Graphics 9 (3) (2003) 412–431. [30] S.J. Teller, C.H. Se´quin, Visibility preprocessing for interactive walkthroughs, Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques (1991) 61–70. [31] F. Durand, G. Drettakis, J. Thollot, C. Puech, Conservative visibility preprocessing using extended projections, Proceedings of the Conference on Computer Graphics (2000) 239–248. [32] H. Zhang, D. Manocha, T. Hudson, K.E. Hoff III, Visibility culling using hierarchical occlusion maps, Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (1997) 77–88. [33] N. Greene, M. Kass, G. Miller, Hierarchical Z-buffer visibility, Proceedings of SIGGRAPH’93 (1993) 231–238. [34] V. Koltun, Y. Chrysanthou, D. Cohen-Or, Virtual occluders: An efficient intermediate PVS representation, Proceedings of the 11th EUROGRAPHICS’00 on Rendering Techniques (2000) 59–70. [35] T.L. Casavant, J.G. Kuhl, A taxonomy of scheduling in general-purpose distributed computing systems, IEEE Transactions on Software Engineering 14 (2) (1988) 141–154. [36] D.L. Eager, E.D. Lazowska, J. Zahorjan, Adaptive load sharing in homogeneous distributed systems, IEEE Transactions on Software Engineering 12 (5) (1986) 662–675. [37] L.M. Ni, C.-W. Xu, T.B. Gendreau, Drafting algorithm—a dynamic process migration protocol for distributed systems, Proceedings of the Fifth International Conference Distributed Computing Systems (1985) 539–546. [38] N.G. Shivaratri, P. Krueger, Two adaptive location policies for global scheduling algorithms, Proceedings of 10th International Conference Distributed Computing Systems (1990) 502–509. [39] N.G. Shivaratri, P. Krueger, M. Singhal, Load distributing for locally distributed systems, IEEE Computer 25 (12) (1992) 33–44. [40] N.G. Shivaratri, M. Singhal, A load index and a transfer policy for global scheduling tasks with deadlines, Concurrency: Practice and Experience 7 (7) (1995) 671–688. [41] C. Lu, S.-M. Lau, An adaptive load balancing algorithm for heterogeneous distributed systems with multiple task classes, Proceedings of the 16th International Conference Distributed Computing Systems (1996) 629–636. [42] N. Dadoun, D.G. Kirkpatrick, J.P. Walsh, The geometry of beam tracing, Proceedings of the Symposium on Computational Geometry (1985) 55–71. [43] T.A. Funkhouser, P. Min, I. Carlbom, Real-time acoustic modeling for distributed virtual environments, Proceedings of SIGGRAPH’99 on Computer Graphics (1999) 365–374. [44] B. Humphrey, Unofficial Quake 3 BSP Format, Available through the Internet: http:// www.GameTutorials.com.