Available online at www.sciencedirect.com
Journal of Systems Architecture 54 (2008) 145–160 www.elsevier.com/locate/sysarc
A new architecture for efficient hybrid representation of terrains M. Amor
a,*
, M. Bo´o
b
a
b
Department of Electronics and Systems, University of A Corun˜a, Spain Department of Electronics and Computer Engineering, University of Santiago de Compostela, Spain Received 26 June 2006; received in revised form 25 April 2007; accepted 26 April 2007 Available online 22 May 2007
Abstract Interactive visualization of highly detailed terrain models is a challenging problem as the size of the data sets can easily exceed the capabilities of current hardware. Hybrid representation of terrains tries to solve the problem by combining the advantages in terms of reduced size associated with multiple levels-of-detail of digital elevation models with the high quality associated with the triangulated irregular networks. However, as the measurements of both representations have different origins, a direct representation of the hybrid system would result in discontinuities. In this paper, we present an architecture for hybrid representation of terrains based on a local convexification algorithm. The architecture we propose permits the generation of the additional triangles required to join the models and thereby avoid the discontinuities. The architecture is simple and regular and a high triangle generation rate is achieved. Different optimizations have been performed that avoid waiting cycles between tessellator units and so increase the productivity rate of the system. Implementation results are shown for a Virtex-II FPGA as an application example. 2007 Elsevier B.V. All rights reserved. Keywords: Hardware architecture; Hybrid representation model; Terrain visualization; FPGA implementation
1. Introduction Interactive visualization of large terrain models is an important challenge in different applications such as Geographic Information Systems (GIS), virtual reality, flight simulations, virtual city simulators or computer games. The main problem of such applications is the storage, transmission and processing requirements associated with high resolution terrain models. Despite the fast evolution of current graphics cards, these requirements limit * Corresponding author. Tel.: +34 981 167000x1215; fax: +34 981 167160. E-mail address:
[email protected] (M. Amor).
the complexity of the terrain model to be rendered when real time is desirable. A method usually employed for representing the terrain surfaces is the utilization of a Digital Elevation Model (DEM) [15], that samples the elevation of the terrain in regular grid positions. Due to the large size of the real terrain DEM data sets and the limited capabilities of the graphics cards, the utilization of different resolution levels (i.e. levelof-detail (LOD) representations [10]), is used extensively. Another representation method also employed for terrain modelling is the Triangulated Irregular Networks (TIN) [12,15]. The TIN is a mesh of triangles that approximates the surface. It is
1383-7621/$ - see front matter 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2007.04.005
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
146
characterized by the high quality models generated but, as a drawback, by the amount of triangles required. In the real-world, the terrain information is usually composed of data of different types. For example, a cartographic terrain model can include a grid data describing the DEMs and microstructures, based on TIN representations, describing morphologically complex terrain features such as riverbeds or ridges. Combining both representations would permit the exploitation of the DEM simplicity and the enrichment of this representation with detailed areas described by the TINs. But, as both data sets are obtained from different measuring systems, they are disconnected. As a consequence, a direct rendering of the mixed model would result in discontinuities in the junction of the models. Recent proposals analyse the problem of representing hybrid terrain models directly. In [3,7] and in order to avoid discontinuities in the junction between representations, an adaptive tessellation procedure is suggested. The objective is to connect both meshes by introducing new triangles to join the borders of the TIN mesh with the grid cell corners. However, no specific tessellation algorithms were proposed. Based on this idea, in [5,1] a meshing scheme for hybrid terrain representations was presented. The algorithm allows an efficient tessellation of the neighbour area between models. As an example, see Fig. 1, where the grid cells in the neighbourhood of the TIN structure are adaptively tessellated so that no discontinuities are generated. Although the authors suggest the implementation of the tessellation algorithm in hardware at the entrance of the Graphics Processing Unit (GPU), no specific implementation is proposed. In this work, we concentrate on the design of an architecture for implementing the tessellation algo-
a
b
rithm suggested in [5,1]. The authors suggest two possible algorithms, one based on the tessellation into triangles and another based on a tessellation into a combined system of triangles and sub-cells. Here, we focus on the first algorithm that generates only triangles. We propose an architecture for hybrid model representation of terrains. Specifically, we have developed all hardware units associated with the algorithm and an efficient scheduling method to increase the processing rate of the algorithm. Based on our analysis, the proposed architecture is hardware efficient, with a low computational cost and high tessellation rates. We have implemented our proposal in an FPGA of Xilinx Virtex-II FPGA [14]. We have optimized the implementation to reduce the hardware cost and increase the processing rate. As a result, our implementation only requires 592 slices and the critical path has been optimized, resulting in an operating frequency of 89 MHz.
2. Hybrid meshing algorithm In [5,1], a new meshing algorithm for hybrid representation of terrains combining multiresolution grids with high resolution TINs was presented. The method permits an efficient representation of the different LOD tessellations employed for adapting the meshes in a unified representation. The GPU receives the information and once a specific LOD is selected, the tessellation for adapting the TIN and the grid can be simply and efficiently decoded. As this representation is generic and independent of the view point, the information has only to be sent once from the CPU to the GPU. Following this strategy achieves, on the one hand, the reduction of computational load of the CPU and, on the
c
Fig. 1. Hybrid terrain model representation: (a) grid; (b) TIN; (c) combined.
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
other, the reduction of communication requirements between CPU and GPU. The basic framework of the proposal is schematically depicted in Fig. 2. The hybrid system is preprocessed in the CPU to encode a unified representation. This information is sent to the graphics card where the triangles of the hybrid model are decoded/generated and rendered. With real-time rendering as an objective, a specific hardware unit can be included for implementing these algorithms. Similar techniques were proposed in other contexts, for example for adaptive tessellation of triangle meshes based on displacement map information [6,4,2] where the adaptive tessellations are performed in specific hardware. The algorithm proposed in [5,1] results in a good candidate for a hardware implementation. The simple tessellation scheme proposed is based on a convex TIN Boundary structure inside each partially covered cell. The premise of local convex structures is restrictive and the authors propose an efficient generation and encoding of incremental convexifications. This permits the encoding of the convexification associated to any LOD as a pre-processing step in the CPU. As a result, the decoding algorithm to be implemented in hardware, is based on addition and comparison operations. The simple units to be implemented, together with the associated low storage requirements make this algorithm adequate for implementation in hardware. In this paper, we present the proposed architecture for supporting in hardware this hybrid terrain representation method. This hardware unit generates the triangles of partially covered cells for convex TIN boundaries. It also decodes the triangles associated to the convexification procedure, previously performed in the CPU. As will be shown in this paper, the algorithms involved have simple computational requirements and, additionally, the scheduling we propose assures a high tessellation rate.
CPU (Hybrid Model Codification)
147
The algorithm proposed in [5,1] is based on the pre-processing and efficient encoding of local convexifications in the CPU. This pre-processing is detailed in the following subsections. After this, the tessellation algorithm is described and its hardware implementation is presented. 2.1. Incremental convexification and hybrid model representation The algorithm presented in [5,1] is based on the local convexification of the TIN Boundaries and the efficient representation of the information to be sent to the graphics card. In this subsection we summarize the basis of the convexification procedure. The CPU performs the convexification incrementally, starting from the finest grid resolution level. For each partially covered cell the convex hull of the TIN boundary is processed and the caves between TIN boundary and convex hull are tessellated. Similar procedures were previously employed in the context of non-convex tetrahedral mesh visualization [13,8,9]. As an example, Fig. 3a shows four cells corresponding to the finest resolution level of the grid. The TIN is depicted in grey and the TIN boundary is explicitly marked and vertices on it are indicated with labels. The local convexification and triangulation results are indicated in Fig. 3b. Local convex hulls are delimited by vertices {0,1,3,4} (up-left cell), {4,12} (down-left cell) and {12,13,15,16} (downright cell). Once the convex hull inside each cell is determined, any standard tessellation algorithm could be applied for generating triangles inside the caves [11]. The process is repeated sequentially for the consecutive coarser levels. As previously indicated, an incremental strategy is performed preserving the triangles generated in previous convexifications. Following the previous example of Fig. 3, the following level of detail is convexified in Fig. 4.
TIN + GRID + Hybrid Model Representation
Adaptive Tessellation Unit
Standard Graphics Pipeline
GRAPHICS PIPELINE
Fig. 2. Generic structure of the system.
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
148
a
b 1
1
0
0
3
2
3
2
4
4
6
6
13 5
7
12
13 5
15
7
14
12
11
15 14
11
8
8 9
TIN
10
9
16
10
TIN
16
Fig. 3. (a) Four cells of the grid with the TIN silhouette and (b) local convexification.
The key of this incremental convexification algorithm is that the convexification triangles corresponding to the different levels of detail can be efficiently encoded in a unified representation. This information, together with the grid and the TIN, is sent to the graphics card to generate all the adaptive tessellations required for any LOD. In the following, the additional information to be preprocessed/sent from the CPU to the graphics card is indicated. 2.2. Convexification representation The TIN boundary representation includes the information on the TIN boundary vertices, together with connectivity information. The connectivity representation stores, compactly, the triangles generated for all different convexification levels. Storing the TIN boundary following a clockwise ring structure, the connectivity associated to a vertex indicates the distance (number of vertices) between the vertex and the most distant in the ring connected
1 0
3
2 4 6
13 5
7
12
15 14
11 8 9
10
TIN
16
Fig. 4. Convexification of the coarsest level.
to it. This way, if connectivity of vertex vi is j, it means that the farthest vertex connected to it is vi+j. To clarify this, let us consider the example depicted in Fig. 4. The TB information can be summarized by the array: TB ¼
½0ð1Þ; 1ð14Þ; 2ð1Þ; 3ð10Þ; 4ð9Þ; 5ð6Þ; 6ð1Þ; 7ð2Þ; 8ð1Þ; 9ð1Þ; 10ð1Þ; 11ð1Þ; 12ð1Þ; 13ð2Þ; 14ð1Þ; 15ð1Þ; 16ð1Þ
where the connectivity value of each vertex is indicated within brackets. Then vertex 0 is connected with the following one (vertex 1); vertex 1 is connected with that at distance 14, i.e. vertex 15, and so on. Let us analyze vertex 4, with connectivity 9. This means that vertex 4 is connected with vertex 13, and that all vertices between them are in the cavity. These vertices are: 5(6), 6(1), 7(2), 8(1), 9(1), 10(1), 11(1), 12(1). The connectivity values indicate two nested cavities: between 5 and 11 and between 7 and 9. The algorithm assumes a sequential connection of the starting vertex of a cavity to all vertices inside it, but this connecting structure is broken if nested caves exist. For this example, vertex 4 is connected to all vertices between 5 and 13, but not inside a nested cavity, that is: {5,11,12,13}. Vertex 5 is connected with all vertices between 6 and 11, but not inside a nested cavity, that is: {6,7,9,10,11}, and so on. As explained in [5,1] the connectivity values generated for the coarsest LOD can be employed for any LOD and this unified representation can be employed for the reconstruction of the convexification triangles associated with the different levels of detail. The connectivity information required is small, as only one index per vertex has to be included. The simplicity of the tessellation procedure, as well as the efficient data management obtained, is directly related to this representation.
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
2.3. Correspondence between representation models The representation model requires two additional pieces of information: the grid classification list and the vertex classification list. The first list permits the identification of the cells to be directly rendered and the cells to be tessellated. The second list indicates the TB vertices to be employed for the cell tessellations. The utilization of these two lists allows the simplification of the tessellation procedure, as the cells to be tessellated and the TB vertices implied in the tessellation can be directly identified. The grid classification list (GC) classifies the cells as non-covered (NC), partially covered (PC) and completely covered (CC). The codes employed are 0 (for NC cells), 10 (for CC cells) and 11 (for PC cells) assuming that cells are more frequently NC. As an example, the GC list for the cells of Fig. 3 is (cells in row order, from left to right): GC ¼ f11; 0; 11; 11g The vertex classification list (VC) stores the following information per PC cell: {(A,L),(C,I)}. (A,L) indexes the TIN boundary vertices inside the cell. A indicates the index (in the TIN boundary array) of the first vertex of the section, while L indicates the number of consecutive vertices included in the section (consecutive positions in the TIN boundary array). On the other hand, (C,I) indicates the corner information for the tessellation. Specifically, C indicates the starting corner and I the number of uncovered corners. As an example, the vertex classification list for the three PC cells of Fig. 3 is: VC ¼ fð0; 5Þð0; 3Þ; ð4; 9Þð1; 1Þ; ð12; 5Þð0; 2Þg Note that the cells are listed following a row order and that the corners notation employed is: 0 (upleft), 1 (up-right), 2 (down-right) and 3 (down-left). 3. Tessellation algorithm In this section, we summarize the main algorithm steps to be performed for the convexification, reconstruction and final tessellation of the convex TIN. After this, and in the following section, we present the architecture we propose for the implementation of these algorithms. 3.1. Convexification triangles generation The algorithm involved in the reconstruction of the triangles associated with the convexification is
149
presented first. Let us denote the TIN boundary vertices inside the cell as {vn, . . . , vn+L1} and their connectivity values as {vnÆc, . . . , vn+L1Æc}. T1, T2 and T3 denote the vertices of generated triangle T. The starting vertex of each nested cavity under construction is stored in an array called Sk with k > 0 and SkÆcount temporally stores the number of already processed vertices inside a cavity once a nested cavity k is opened. The structure of the algorithm is as follows (see pseudo-code in Fig. 5): The TIN boundary vertices (vn . . . , vn+L1) that fall into the cell are sequentially processed (line 2). All vertices not included in any cavity (line 4) are considered part of the convex hull and stored in a convex hull list (line 5). Once a cavity starts, the following vertices build the triangles in the cavity under construction (lines 7–13). When a triangle is completed (line 10) it is sent to the standard graphics pipeline to be rendered (line 11). If not, the vertex is stored as part of the following triangle under construction (line 12). Additionally, when a vertex closes a cavity, the starting vertex of the cavity is eliminated from the S list and an additional external triangle is identified (lines 14–26). On the other hand if the vertex closes the most external cave it is also included in the convex hull list (lines 27 and 28). Additionally, if the connectivity of the vertex is larger than one, it is included in the S list as the starting point of a new nested cavity (lines 30–38). This procedure permits the reconstruction of the triangles associated with the convexification algorithm. As can be derived from the algorithm code, the computational requirements are low, as only additions/comparisons are involved.
3.2. Corner triangles generation The corner tessellation procedure receives the TIN boundary vertices of the convex hull (after the convexification procedure) together with the sorted corners for the tessellation procedure (information contained in the VC list). The tessellation algorithm involves different steps indicated in the pseudo-code of Fig. 6. For reasons of clarity, we have relabelled the vertices of the convex hull with consecutive superindices. We employ superindices to emphasize that convex hull vertices do not necessarily correspond to consecutive vertices of the TIN boundary. We consider m vertices vi with i = 0, . . . , m 1 in the convex hull. The r uncovered
150
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
Fig. 5. Tessellation algorithm for the convexification triangles.
Fig. 6. Tessellation algorithm for the corner triangles.
corners are also relabelled for simplicity as cj with j = 0, . . . , r 1. The first triangle is composed of the first corner and the first two vertices (lines 1–5). After this the
following vertices of the convex hull (line 8) are processed to compose the intermediate triangles. The corner shift control is performed through the evaluation of the sign of the z-coordinate of two
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
151
cross-products (lines 10 and 11). When no corner shift is required, the triangle is built by the current corner and two consecutive vertices (lines 12–15). When a corner shift is required, the triangle is built by the two corners and the current vertex (lines 16– 20). This vertex is reemployed as part of the following triangle (line 19). Once all convex hull vertices have been processed, the remaining non-processed corners have to be considered (lines 21–28). The final triangles are composed of the last vertex of the convex hull and two consecutive corners. This simple procedure permits the reconstruction of the triangles associated with the convex hull of the TIN boundary and the corner cells. The computational requirements of the algorithm are very low, as they are mainly associated with the corner shift control.
connectivity of each vertex is stored in the first unit while the coordinates are stored in the second unit (for our example vertex 0 has connectivity 1, vertex 1 has connectivity 14, and so on). The reason for employing a double bank memory is to allow simultaneous accesses from the Convexification Unit and the Corner Unit. In the same way, the VC list is split into two storage systems: the section information (A,L) is stored in the VC_section Memory ((0,17) for our example) while the corner information (C,I) is stored in the VC_corner Memory ((0,2) in our example). In the GC Memory, the GC information associated with each cell is stored (as in our case the cell is PC, the 11 code is stored). In the following, the two computational units (Convexification and Corner units) are analyzed together with the storage systems associated to each one.
4. Tessellation architecture
4.1. Convexification unit
In previous sections, the algorithm for adaptive tessellation and representation of hybrid terrain models was summarized. In this section, we present an architecture proposal for implementing this algorithm in hardware. The main challenge of the implementation in hardware is the dependence between tessellation units. Specifically, the Corner Unit productivity is limited by the availability of convex hull vertices for tessellation and with this, by the productivity rate of the Convexification Unit. As will be shown in the following, the utilization of temporal buffers for storing the input to the Corner Unit together with the efficient scheduling we propose, allows the achievement of a high productivity rate and avoids waiting cycles in both tessellation units. The general scheme of the architecture is shown in Fig. 7. It is mainly composed of two computational modules: the Convexification and the Corner units. In the first unit the local convex hull inside each cell is built and the associated triangles are generated according to the algorithm explained in Section 3.1. In the second unit, the triangles among cell corners and convex hull vertices are obtained according to the corner triangle generation of Section 3.2. To clarify, the data values corresponding to the example of Fig. 4 are indicated in the figure. The remaining units of the figure are basically temporal storage systems for the hybrid representation of the terrain. Specifically, TB_connectivity Memory and TB_coordinates Memory are storage systems for the TIN Boundary information. The
In this unit, the convexification is performed, i.e. the convex hull vertices are identified and the convexification triangles are generated. As indicated in Fig. 8 it basically consists of a stack for storing the information associated to the caves (S, SÆc and SÆcount lists), a counter for generating consecutive vertex indices of a given section and an adder for counting vertices inside a cave. In detail, the section information (A,L) is read from the VC_section Memory. This information is employed in the Indices Generator, that basically consists of an incrementer, that generates the consecutive vertex indices of the section (A, A + 1, A + 2, . . ., A + L 1). These indices are employed to address the TB_connectivity Memory in order to read the connectivity values associated to each vertex in the section. The Convexification Triangle Generator, which is basically the control unit of the system, detects when a cavity starts (viÆc > 1) and stores the information in the stack (S and SÆc lists). If a nested cavity is detected, the partial count of vertices inside the current cavity is also stored (SÆcount list). The SÆcount value is conditionally added in the Nested Cave Adder, in the case of nested cavities, so that partial sums are added together. Once the cavity is closed the Convexification Triangle Generator shifts the information out of the stack (S, SÆc and SÆcount lists). The Convexification Triangle Generator also manages the three registers for storing the indices
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
152
1 14
(T1 ,T2 ,T3)
1
vi .c
CONVEXIFICATION
Convex Hull
(T1 ,T2 ,T3 )
CORNER
UNIT
UNIT
TB_con. MEMORY
vi (x,y,z) 0(x,y,z)
(0,17)
(C,I)
1(x,y,z)
(A,L)
(0,2)
VC_section 16(x,y,z)
MEMORY
VC_corner MEMORY
TB_coord.
11
MEMORY
GC
GC MEMORY
Fig. 7. Architecture model.
Vertex index (i)
INDICES GENERATOR
(A,L) (from VC_section Memory)
T1
(to TB_con Memory)
T2
T3
(T 1,T2 , T3 )
T_flag vi .c (from TB_con Memory)
CONVEXIFICATION
SHIFT CONTROL
S LIST
S.c LIST
S.count LIST
CH
TRIANGLE GENERATOR
ADD CONTROL Sk
Sk.c
Sk .count
Fig. 8. Convexification unit.
NESTED CAVE ADDER
CH_flag
Cave_count
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
of vertices of each convexification triangle. Once a triangle is generated a T_flag signal is activated. On the other hand, all vertices of the local convex hull (i.e. that are not inside a cave) are indicated with a CH_flag signal, and are sent to Corner Unit. To clarify the functionality of each unit and, as an example of application, we detail in Table 1 the content/output of each unit for some consecutive processing cycles corresponding to the cell indicated in Fig. 4. The columns represent the vertex to be processed and its connectivity, the cave lists contents, the cave_count value, the triangle under construction, the T_flag and the CH_flag signals. In this example, we analize vertices 5–11. As these vertices are defined inside three nested cavities the CH_flag is always 0. The algorithm processes each vertex, increments the counter, includes the vertex as part of the triangle defined by previous vertex and the root of the cave. This sequence is broken when a new cave starts or is closed. For example, the first vertex detailed in this figure is vertex 5 (first row of the table). The connectivity of the vertex, 6, indicates that vertex 5 starts a cave. For this reason the vertex is introduced in the Sk list, the connectivity in the SkÆc list and the count value (in bold) in the SkÆcount list. The cave_
153
count is initialized to start the counting of vertices inside this new cave. The following triangle to be built includes vertex 5 (root of the new cave). The following vertex starting a cave is vertex 7 (connectivity 2). The incremented counter value (in bold) is stored, together with the vertex information, in the cave lists. The counter is initialized to start the counting of vertices of the new cave. The first vertex closing a cave is vertex 9. In this case, the count value is 2 (in bold), that means that 2 vertices were processed since the last cave started (cave with root vertex 7). Taking into account that the connectivity value of vertex 7 is 2 (number in bold in the last position of the SkÆc list), the end of the cave is detected. The information associated to vertex 7 is eliminated from the cave lists, and the SkÆcount value is added to the cave_count value. This value is now 4, that means that 4 vertices were processed since the last open cave was started (cave with root vertex 5). The procedure follows until all vertices are processed. In summary, the Convexification Unit is very simple as it basically consists of a stack system, a counter and an adder. The complexity of the system is mainly associated to the control. In any case, as will be shown in the result section, this complexity is low
Table 1 Convexification unit: detailed data for the example of Fig. 4 Vertex i (ViÆc)
Cave lists Sk, SkÆc, SkÆcount
5(6)
{1,3,4} 5 {14,10,9} 6 {2,1} 1 {1,3,4,5} {14,10,9,6} {2,1,1} {1,3,4,5} 7 {14,10,9,6} 2 {2,1,1} 2 {1,3,4,5,7} {14,10,9,6,2} {2,1,1,2} {1,3,4,5} ! 7 {14,10,9,6} ! 2 {2,1,1} ! 2 {1,3,4,5} {14,10,9,6} {2,1,1} {1,3,4,5} {14,10,9,6} {2,1,1} {1,3,4} ! 5 {14,10,9} ! 6 {2,1} ! 1
6(1)
7(2)
8(1)
9(1)
9(1)
10(1)
11(1)
Cave_count
Triangle (T1,T2,T3)
T_flag
CH_flag
1!0
(5, , )
0
0
1
(5,6, )
0
0
2!0
(5,6,7)
1
0
1
(7,8, )
0
0
2+2=4
(7,8,9)
1
0
4
(5,7,9)
1
0
5
(5,9,10)
1
0
6+1=7
(5,10,11)
1
0
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
154
and the system can be implemented in a very efficient way and with reduced hardware requirements. 4.2. Corner unit The Corner Unit receives the convex hull vertex indices (from the Convexification Unit) and, together with the sorted corners, generates the tessellation triangles. The general structure of the unit is detailed in Fig. 9. To build the triangles, the Corner Triangle Generator, main core of the system, determines when a shift in the grid corners must be performed and this is controlled by a simple comparison of slopes. To clarify, we specify the data corresponding to the example of Fig. 4 in the figure. In detail, let us start with the corner coordinates generation performed in the Corner_Coordinates Generator. The coordinates generation is very simple, as we process the cells following a sequential order (incremental in x-coordinate, incremental in y-coordinate). Then, the unit consists of two simple counters for keeping track of the number of cells in x-direction and in y-direction and two adders for generating the four coordinates of the cell. Note that in the figure, the corners are labelled with
superindices to emphasize that the corners are cyclically listed starting with the corner specified in the VC list. In the example of Fig. 4 the C field is 0, so that the corners are listed from C0 to C3. The convex hull vertices, generated by the Convexification Unit, are temporally stored in an input buffer, CH Memory, to be processed by demand on the Corner Unit. In our example these vertices are: {0,1,15,16}. This temporal storage permits the braking of the dependences between the Corner Unit and the Convexification Unit. This permits the avoidance of waiting cycles and so increases the triangle productivity rate. The CH Memory sequentially stores the convex hull vertices coming from the Convexification Unit. To identify the consecutive set of convex hull vertices corresponding to the same TB section, a CH_Pointer Memory is included. This memory stores the index of the first convex hull vertex of each processed section. In our example this index has value 0. During the tessellation procedure, the convex hull indices read from the CH Memory are employed to address the TB_coordinates Memory to read the vertices coordinates. These coordinates
0 1
CH
Vertex index (i)
15 CH_flag
(to TB_coord Memory)
16
0
T1
CH_pointer MEMORY
T2
T3
(T1 ,T 2 ,T3 )
CH MEMORY T_flag CORNER
vi .coordinate (from TB_coord Memory)
TRIANGLE GENERATOR
I j C .corrdinate
GC (from GC MEMORY)
CORNER_coord GENERATOR
C
Fig. 9. Corner unit.
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
are employed in the Corner Triangle Generator to produce the final triangles. In general, triangles are composed of current corner, Cj, and consecutive vertices of the convex hull, vi and vi+1. The Corner Triangle Generator decides when to shift the cell corners by evaluating the sign of the z-coordinate of two cross-products. When a corner shift is detected, two triangles can be obtained due to the regular and sequential structure of our algorithm. The first triangle is composed of two corners, Cj and Cj+1 and the current vertex of TB, vi, and the second triangle is composed of the new current corner, Cj+1 and two vertices of the TB, vi and vi+1. Then, the Corner Triangle Generator basically consists of four multipliers and two adders. To avoid idle cycles the Corner Triangle Generator evaluates in advance the triangles associated to vertices vi+1 and vi+2 while triangles associated with vi and vi+1 are built. Additionally, the Corner Triangle Generator has two counters to keep count of the number of processed vertices and corners. The number of convex hull vertices associated to a given section can be directly obtained from the CH_Pointer Memory where the initial convex hull vertex index per section is stored. As a result, when all convex hull vertices of a section have been processed, the following triangles are composed by the last vertex and consecutive non-processed corners. The cell is completely processed when the complete set of non-covered corners have been evaluated. 5. Results In this section, we present the main results of our implementation. As indicated in [5], the hardware
155
design, in terms of word size and storage requirements, is determined by the application and this mainly depends on the cartographic area to be represented. We have employed as a reference for our implementation the three models depicted in Figs. 10–12. As the TIN usually cover smaller areas of terrain, we consider that these models exceed the requirements of usual applications. With respect to the quality obtained, Figs. 10a, 11a and 12a represent the regular grids while Figs. 10b, 11b and 12b represent the final models (grid plus TIN). The benefits, in terms of quality, of including the TIN models are obvious and, as expected, no discontinuities are generated. Note that the final models represented in these figures include triangles associated to the TIN (in white), to the convexification procedure (in black) and to the corner tessellation algorithm (in grey). In the following, we present an analysis of the implementation in terms of hardware and timing requirements. 5.1. Architecture evaluation The architecture has been implemented in a Virtex-II 2000 FPGA using the Xilinx Tools. Our implementation permits the processing of very irregular systems with more than one TB section per PC cell and, in consequence, is very general and so allows processing of very irregular TIN shapes [5]. Note that this increases the complexity of the algorithm and its implementation, mainly the control system. However, this allows the extension of the range of applicability of the architecture. The computational complexity of the design is very low. As was detailed in previous sections, few
Fig. 10. Model 1: (a) grid; (b) final model.
156
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
Fig. 11. Model 2: (a) grid; (b) final model.
Fig. 12. Model 3: (a) grid; (b) final model.
and simple arithmetic units are required, and most of the complexity is due to the control units. Even the control units can be implemented with simple arithmetic units (counters, comparators,. . .). On the other hand the storage requirements are low. As a result, the design can be implemented in a low cost FPGA. Based on the applications we have worked with, we have assumed that the TIN Boundary has less than 512 vertices, no more than 256 sections, up to four sections per cell, each section has, at most, 64 vertices, and that the maximum connectivity value is 32. We have also found that in practical applications up to four nested cavities are generated. On the other hand our test images have up to 2500 cells and 150 partially covered cells. These
quantities, as indicated in Table 2, determine the memory requirements of our system. In detail, the TB_connectivity Memory stores up to 512 connectivity values (up to 32, that is, 5 bits). The TB_coordinates Memory stores the 512 TB coordinates, each Table 2 Memory size requirements (number of words x bits per word) Memory
Size
TB_connectivity TB_coordinates(x, y, z) VC_section VC_corner GC CH CH_pointer
512 · 5 512 · (32,32,32) 256 · 17 150 · 5 2500 · 2 512 · 9 150 · 9
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
one with 32 bits. The VC_section Memory stores the sections information (A,L). There is up to 256 sections and for each section we store the number of sections identified in the same cell (up to 4, that is, 2 bits), the index of the first vertex (up to 512, that is, 9 bits) and the number of vertices in the section (up to 64, that is, 6 bits). The VC_corner Memory stores the corner information (C,I). We have up to 150 PC cells, for each cell 2 bits are required to encode the first corner while 3 additional bits for the number of corners to be employed. The GC Memory stores the cell classification (PC, CC, NC) with 2 bits, where grid systems with 2500 cells were considered. The CH Memory stores the indices of the convex hull vertices generated by the Convexification Unit. In consequence, the memory has 512 words of 9 bits. Finally, the CH_pointer Memory is employed as a pointer for the CH Memory to identify the first convex hull vertex inside each PC cell. As we have 150 PC cells we require a memory with 150 words of 9 bits. All memories were implemented with block RAM memories, obtaining higher speed than with distributed memories. The word sizes indicated above determines the data path widths of the system. Our implementation employes dedicated multipliers for the slopes evaluation/comparison operation (four multipliers are required, two per crossproduct). Additionally, the resulting number of slices and operating frequencies are shown in Table 3. We have included two sets of data. The first, when a reduced precision for the slopes computation/comparison is employed (10 bits) and the second when a full integer precision is used (32 bits). To reduce the timing and hardware requirements of the system we have analyzed the possibility of employing a low resolution system for the slopes evaluation/comparison in the Corner Triangle Generator. The utilization of a global coordinate system would not be required for local comparisons, so that a local coordinate system could be directly employed for these control computations. Additionally, the word size can be reduced even more, and
Table 3 Architecture evaluation: number of slices and frequency
Slices (10 bits) Slices (32 bits) MHz (10 bits) MHz (32 bits)
Convexification Unit
Corner Unit
Design
366 (3%) 373 (3%) 89 89
226 (2%) 376 (3%) 109 72
592 (5%) 684 (6%) 89 72
157
together with this, the computational and hardware requirements. This reduction is at the cost of the accuracy in the slopes computation/comparison. However, the results of our simulations show no erroneous result when the word size is reduced to 10 bits. In consequence, this word size can be employed for the slopes computation/comparison with no implications in the resulting triangles generated. In any case, if an erroneous decision were performed, this would result in the generation of a very small triangle or triangles with almost colineal vertices. This probably would not be noticeable in a real system. Most of the slices of the Convexification and Corner units are devoted to control. We could reduce the complexity of these control units by reducing the capabilities of the system (number of sections per cell, unusual TIN Boundary structures,. . .). The operating frequency for the 10-bits design is determined by the Convexification Unit. Specifically, the critical path is also associated with the control system of this unit. Reducing the complexity of the applications to be processed (complexity of the TIN Boundary) would result in a higher frequency. For the 32-bits design the operating frequency is limited by the Corner Unit, specifically by the slopes evaluation/comparison. The resulting device utilization indicates that only 5–6% of the device is employed. This permits mapping of the design into a smaller device or the employment of the remaining system to include other tessellation proposals [5]. 5.2. Timing evaluation In this section, we briefly evaluate the timing associated with our implementations. The objective, in terms of triangle generation rate, is to obtain high productivity so that this unit does not become the bottleneck of the system. The timing sequence of the algorithm is irregular, as it is based on conditional decisions. The generation of triangles for the convexification follows an irregular time pattern. The Corner Unit has a more regular behaviour, but it processes the data coming from the first unit. This leads to an irregular timing processing rate that has to be analyzed and optimized. Starting with the Convexification Unit, the generation of triangles does not follow a regular time pattern. The generation is produced only inside a cave. On the other hand, the generation rate depends on whether or not we are opening a cave, inside a cave,
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
158
In 23 cycles, four convex hull vertices are identified and 13 triangles are generated. This leads to a very high productivity rate taking into account the irregularity of the TIN Boundary of the example. As previously indicated, the irregular convex hull productivity suggests the utilization of a buffer for storing this information at the entrance of the Corner Unit. In this way, the Corner Unit processes this information by using only its own demands, avoiding complicated protocol communications and waiting cycles between units. Taking this into account, the productivity rate of this unit is close to one triangle per cycle. In those situations when there are no CH vertices stored in the CH Memory the productivity rate could be lower, as this unit depends on data coming from the Convexification Unit. As an example Fig. 14 shows the main Corner Unit signals for the example of Fig. 4. In the figure, the convex hull composed by vertices {0,1,15,16} is connected to the two uncovered corners c0 and c1. As a result, the following triangles, one per cycle, are generated: T(0,c0,1), T(c0,1,c1), T(1,c1,15), T(15,c1,16). Once the system was initialized, one triangle per cycle was generated for this example. Globally both units generate triangles following an irregular time pattern. We have developed a scheduling methodology based on the segmentation of the processing units to minimize waiting cycles and to increase the productivity rates. The strategy employed is based on the temporal storage of the intermediate results to be delivered by demand to
closing a root cave, or closing a nested cave. This means that the triangle generation rate depends on the TIN Boundary structure. The productivity rate obtained with our implementation is very high. The convex hull vertices identification rate is one per cycle when we are not in a cave. The triangles generation inside a cave has been optimized to almost one per cycle. This was achieved by combining computations of the current triangle with computations associated to the following one. This is possible because of the temporal exploitation of vertices between consecutive triangles. As a simple example we can analyze a regular cave, without internal nested caves, between vertices vi and vi+k. The triangles inside the cave are (vi, vi+1, vi+2), (vi, vi+2, vi+3), (vi, vi+3, vi+4), . . . , (vi, vi+k1, vi+k). Then, for regular cave structures once the first triangle is generated, only one additional vertex per triangle is required. Then, a rate of a triangle per cycle can be achieved. Let us analyze again the irregular TIN boundary of Fig. 4. The main signals involved are depicted in Fig. 13. As can be observed, the convex hull vertices 0, 1, 15 and 16 are detected in cycles 1, 2, 22 and 23; while the following triangles are generated: T(1,2,3) in cycle 4, T(5,6,7) in cycle 8, T(7,8,9) in cycle 10, T(5,7,9) in cycle 11, T(5,9,10) in cycle 12, T(5,10,11) in cycle 13, T(4,5,11) in cycle 14, T(4,11,12) in cycle 15, T(4,12,13) in cycle 16, T(3,4,13) in cycle 17, T(1,3,13) in cycle 18, T(13,14,15) in cycle 20 and T(1,13,15) in cycle 21.
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
10
11
CLK Vi (CH)
9
CH_flag (1,2,3)
T
(5,6,7)
(7,8,9)
(5,7,9)
21
22
T_flag
12
13
14
15
16
17
18
19
20
23
CLK Vi (CH)
10
11
12
13
14
15
CH_flag T
(5,9,10)
(5,10,11)
(4,5,11)
(4,11,12) (4,12,13)
(3,4,13)
(1,3,13)
(13,14,15) (1,13,15)
T_flag
Fig. 13. Wave signals of the Convexification Unit.
16
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
159
1
2
3
4
5
j
c0
c0
c1
c1
c1
vi
0
1
CLK
C
T
(0,c 0 ,1)
1
(c 0 ,1,c 1 )
15
(1,c 1 ,15)
16
(15,c 1 ,16)
T_flag
Fig. 14. Wave signals of the Corner Unit.
the following stages. We have demonstrated with our implementation that the hardware requirements are very low. We have tested our architecture for very extreme situations with only partially covered cells. Even in this extreme case, we have obtained a processing rate of 0.92 triangles/cycle for the partially covered cells, that is, almost a triangle is generated per cycle. Note that this rate involves only triangles associated to partially covered cells and that, for normal hybrid models, completely covered and non-covered cells have also to be considered. This implies that the graphics pipeline is also devoted to the processing of the TIN triangles and the uncovered cells. 6. Conclusions In this paper, we have presented an architecture for hybrid representation of terrains based on local convexifications. The main objective of this work has been to design a fast and efficient architecture for terrain representation based on a hybrid system combining TINs with regular grids. The main challenge has been to decouple the tessellation units associated to the algorithm and, with this, to increase the productivity rate associated with the partially covered cells. We have shown that a fast architecture is possible even when the complete algorithm manages more than one TB section per cell. The productivity rate for irregular systems is close to one triangle per cycle for the partially covered cells. Taking into account that the graphics pipeline is also devoted to non-covered cells and to the TIN, the productivity rate obtained is more than satisfactory. We have implemented the proposed design in a Virtex-II 2000 FPGA. Only 5% of the resources have been employed, allowing mapping of the
design in a smaller device. The resulting processing rate is 89 MHz. References [1] M. Amor, M. Bo´o, J. Do¨llner, Hardware support for hybrid grid representation of terrains. Technical Report, University of Santiago de Compostela. 2004.
. [2] M. Amor, M. Bo´o, J. Hirche, M. Doggett, W. Strasser, A meshing scheme for efficient hardware implementation of butterfly subdivision surfaces using displacement mapping, IEEE Computer Graphics and Applications (2005) 46–59. [3] K. Baumann, J.Do¨llner, K. Hinrichs, Integrated multiresolution geometry and texture models for terrain visualization, in: Joint Eurographics – IEEE TVCG Symposium on Visualization 2000, 2000, pp. 157–166. [4] M. Bo´o, M. Amor, M. Doggett, J. Hirche, W. Strasser, Hardware support for adaptive subdivision surface rendering, in: Siggraph/Eurographics Hardware Workshop, 2001, pp. 33–40. [5] M. Bo´o, M. Amor, J. Do¨llner, Unified hybrid terrain representation based on local convexifications. Geoinformatica, in press, doi:10.1007/s10707-006-0003-y. [6] M. Doggett, J. Hirche, Adaptive view dependent tessellation of displacement maps, in: Siggraph/Eurographics Hardware Workshop, 2000, pp. 59–66. [7] J. Dykes, A.M. MacEachren, M.-J. Kraak, Exploring Geovisualization, Elsevier, 2005. [8] M. Kraus, T. Ertl, Simplification of nonconvex tetrahedral meshes, in: NSF/DoE Lake Tahoe Workshop for Scientific Visualization, 2000. [9] M. Kraus, T. Ertl, Cell-projection of cyclic meshes, in: IEEE Visualization 2001, 2001, pp. 215–222. [10] D. Luebke, M. Reddy, J. Cohen, A. Varshney, B. Watson, R. Huebner, Level of Detail for 3D Graphics, Morgan Kaufmann, 2002. [11] J. O’Rourke, Computational Geometry in C, second ed., Cambridge University Press, Cambridge, 1998. [12] T.K. Peucker, R.J. Folwer, J.J. Little, The triangulated irregular network, in: Proceedings of ASP-ACSM Symposium on DTM’s, 1978. [13] P.L. Williams, Visibility ordering meshed polyhedra. ACM Transactions on Graphics, 1992, pp. 103–126. [14] Xilinx, Xilinx 4000 series datasheet. .
160
M. Amor, M. Bo´o / Journal of Systems Architecture 54 (2008) 145–160
[15] T. Yilmaz, U. Gndnkbay, V. Akman, Modeling and visualization of complex geometric environments, in: Geometric Modeling: Techniques, Applications, Systems and Tools, 2004, pp. 4–30.
Margarita Amor is currently an associate professor at Department of Electronic and Systems of the University of A Corun˜a. She received the B.S. and Ph.D. degree in Physics from the University of Santiago de Compostela (Spain) in 1993 and 1997, respectively. Her research interests is mainly focused in the areas of video processing, computer graphics and parallel computing.
Montserrat Bo´o received the B.S. and Ph.D. degree in Physics from the University of Santiago de Compostela (Spain) in 1993 and 1997, respectively. Currently she is Associate Professor in the Department of Electronics and Computer Engineering at the University of Santiago de Compostela. Her interests are in the area of VLSI digital signal and image processing, computer graphics and computer arithmetic.