Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
Computers & Graphics journal homepage: www.elsevier.com/locate/cag
Special Section on SIBGRAPI 2016
A hierarchical factorization method for efficient radiosity calculations José Pedro Aguerre n, Eduardo Fernández Centro de Cálculo, Universidad de la República, Uruguay
art ic l e i nf o
a b s t r a c t
Article history: Received 19 February 2016 Received in revised form 1 August 2016 Accepted 3 August 2016
The radiosity problem can be expressed as a linear system, where light interactions between patches of the scene are considered. Its resolution has been one of the main subjects in Computer Graphics, which has led to the development of methods focused on different goals. For instance, in inverse lighting problems, it is convenient to solve the radiosity equation thousands of times for static geometries. Also, this calculation needs to consider many (or infinite) light bounces to achieve accurate global illumination results. Several methods have been developed to solve the linear system by finding approximations or other representations of the radiosity matrix, because the full storage of this matrix is memory demanding. Some examples are hierarchical radiosity, progressive refinement approaches, or wavelet radiosity, which may become slow for many bounces. Recently, new direct methods have been developed based on matrix factorization. This paper introduces a novel and efficient error-bounded factorization method based on the use of multiple singular value decompositions and the Z-order curve to sort the patches of the model. This technique accelerates the factorization of in-core matrices, and allows to work with out-of-core matrices passing only one time over them. Using this method, the inverse of the radiosity matrix can be efficiently approximated, reducing the memory and time resources needed to compute radiosity with infinite bounces. In the experimental analysis, the presented method is applied to scenes up to 163 K patches. After a precomputation stage, it is used to solve the radiosity problem for fixed geometries at interactive times. & 2016 Elsevier Ltd. All rights reserved.
1. Introduction Radiosity is a method for global illumination (GI) calculation on scenes with Lambertian surfaces. This method has been the subject of study in several areas such as computer animation, architectural design and heat transfer [1]. To implement this technique, iterative methods have been studied and used [2,3], failing to compute solutions at real-time with many bounces. Other GI methods were proposed and developed, such as instant radiosity [4], precomputed radiance transfer [5], or GPU-based global illumination [6]. These techniques allow modeling many light effects, taking advantage of the new hardware architectures. Nevertheless, they also fail to provide real-time solutions when simulating many bounces of light. Recently, new techniques have been developed to solve the real-time infinite bounce radiosity problem, for a static geometry and dynamic emission [7,8]. n
Corresponding author. E-mail addresses: jpaguerre@fing.edu.uy (J.P. Aguerre), eduardof@fing.edu.uy (E. Fernández).
Usually, due to spatial coherence, the form factors matrix [9] (F, see Table 1) has a low numerical rank. This fact enables the application of factorization techniques to compute low rank approximations of F that can be stored in main memory. Matrix factorization [10] is applied in several subjects related to mathematics and computer sciences. It is useful when matrices with low numerical rank are present in the problem. Nevertheless, it is not always simple to compute the factorization due to memory and execution time reasons. Hence, it is important to continue developing techniques to exploit the different characteristics offered by every specific problem. In this work we present a hierarchical factorization (HF) technique for radiosity calculations. This technique implements a divide-and-conquer strategy based on the calculation of multiple singular value decomposition (SVD). In addition, the sorting of rows/columns of the matrix RF by similarity is exploited to speedup the technique. For this purpose, we use the Z-order curve [11] over the patches of the scene, because each patch is associated to a row/column in the matrix. This method is intended to accelerate the factorization of in-core and out-of-core RF matrices, requiring just one pass.
http://dx.doi.org/10.1016/j.cag.2016.08.003 0097-8493/& 2016 Elsevier Ltd. All rights reserved.
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
2
about low-rank properties of radiosity and radiance matrices can be found in Baranoski et al. [13], Ashdown [14], Hasan et al. [15], and Fernández [7]. This property allows us to approximate RF by the product of two matrices Uk VTk , both with dimension n k (n c k), without losing relevant information. The memory required to store Uk and VTk is O(nk), while for RF it is Oðn2 Þ. When n c k, the memory savings can be significant, even allowing to store them in system memory for large scenes. If RF is replaced by its approximation Uk VTk in Eq. (1), Eq. (2) is obtained, where the radiosity B is now transformed into its ~ approximation B.
Table 1 Symbol notation and meaning. Symbol
Definition
HF n k, r σmax σi ε εp εch μ, σ
Hierarchical factorization method Number of patches Matrix rank Largest singular value (σmax ¼ σ1) ith singular value Expected error Expected error of parent node Expected error of child node Mean and standard deviation
u, v vmax B B~
Left and right singular vectors Right singular vector associated to σmax Radiosity vector Approximation to the radiosity vector
E
Emission vector
I Ik R F RF ðI RFÞ Uk , V k
n n identity matrix k k identity matrix Diag. matrix with reflectivity indexes Form factors matrix Matrix result of the product R F Radiosity matrix
Qk Dk ðA1 j A2 Þ
I Uk VTk B~ ¼ E
ð2Þ
Using theSherman–Morrison–Woodbury formula [10], the matrix I Uk VTk is inverted: 1 VTk E ð3Þ B~ ¼ E þ Uk Ik VTk Uk 2 Hence, Oðnk Þ operations and O(nk) memory are required to find B~ using Eq. (3). Eq. (3) can be transformed as follows: ð4Þ B~ ¼ E Y k VTk E
n k matrices, Uk VTk RF Uk Dk Diagonal matrix with σ 1 …σ k Matrix partition in two column blocks
where
1 Y k ¼ Uk Ik VTk Uk
2.1. The radiosity method
Y k is an n k matrix. When the geometry and the reflectivity of the scene are static, Uk , Vk , and Y k are computed only once in a pre-computation stage. After Y k is found, the calculation of B~ has complexity O(nk). This result is useful when the radiosity equation needs to be solved many times, and E is the only element modified in Eq. (2). One example of a factorization technique applied to radiosity problems is the Low Rank Radiosity (LRR) method [8]. This method allows us to factorize the RF matrix into one full matrix and one sparse matrix. This technique is relatively fast and can be computed by passing just one time over the whole matrix. The main problem of this technique is that two different scene meshes are needed, with two different granularity levels (coarse and fine meshes). The number of patches of these meshes define the dimensions k and n of the factorization matrices. For many scenes, it is not possible to define a coarse mesh with a small enough number of patches, useful for radiosity purposes. This can lead to large Y k and Vk matrices.
The radiosity equation (Eq. (1)) computes the illumination of a scene by solving a linear system [12].
2.3. Relation between the error in the factorization of RF and the radiosity error
The rest of the paper is organized as follows. Section 2 introduces some related work, while Section 3 describes the proposed algorithm and studies its error estimation. The experimental analysis is reported in Section 4, including studies on the performance of the algorithm and a comparison with other factorization techniques. Finally, in Section 5 the research is summarized and some lines of future work are defined.
2. Related work This section introduces some previous work related to our proposal, which includes the radiosity problem and some state-ofthe-art factorization techniques.
ðI RFÞB ¼ E
ð1Þ
In this equation, all matrices have n n dimension, where n is the number of patches. I is the identity matrix, R is a diagonal matrix containing the reflectivity of each patch and F is the form factors matrix, where each cell (i,j) indicates the fraction of light reflected or emitted by patch i that arrives into patch j. Each row of the F matrix is computed based on the scene view from the corresponding patch. When close patches have similar views (there is spatial coherence in the scene), the F matrix has many similar rows/columns. Finally, B and E are vectors containing the radiosity value (W=m2 ) of each patch and its emission, respectively. 2.2. Factorization of the RF matrix There is a clear relation between the spatial coherence of a scene and the low numerical rank of the matrix RF. References
Following Fernández [7], when RF is substituted by RFþ ΔRF, the relative error of the radiosity B (Eq. (1)) is upper bounded: ‖B~ B‖ ‖ΔRF‖ r ‖B‖ 1 ‖RF‖ Taking into consideration that ‖ΔRF‖ r ε, where ε is the expected error of the factorization, and using the 2-norm, another bound can be formulated: ‖B~ B‖2 ε r ‖B‖2 1 σ max
ð5Þ
where 0 o σ max r 1 is the largest singular value of RF. This bound is larger than ε. Experimental results show that in common situations the relative error of the radiosity results can be orders of magnitude smaller.
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
00
2.4. SVD decomposition The singular value decomposition (SVD) of a matrix Amn is a factorization of the form A ¼ UDVT where Umm and Vnn are unitary matrices, and Dmn is a diagonal matrix with non-negative values. The values in D are known as the singular values of A (represented as σi, where σ i ¼ Dði; iÞ). The columns of U and V are its left and right singular vectors respectively. The singular values in D are high-to-low ordered ðσ 1 Z σ 2 Z … Z 0Þ. Following Eckart–Young Theorem [16], if we only consider the r largest singular values of D, then Ar ¼ Ur Dr VTr ^ 2 for all A^ with rankðAÞ ^ rr. Here, ðrankðAr Þ r rÞ minimizes ‖A A‖ the matrices Ur and Vr are the left and right singular vectors associated to the r singular values selected, and Dr is a diagonal matrix containing these values. Also, it is worth mentioning that, given the matrix A, ‖A Ar ‖2 ¼ σ r þ 1 . In this work, the SVD decomposition is used with this notation: ½Q ; V ¼ SVDðAÞ, where Q ¼ UD. In particular, the truncated SVD [10] is applied: ½Q r ; Vr ¼ TSVDðA; εÞ
3
01
10
11
00 0000 0001 0100 0101 01 0010 0011 0110 0111 10 1000 1001 1100 1101 11 1010 1011 1110 1111 Fig. 1. Implementation of the Z-order curve by bit interleaving.
ð6Þ
where ε is a given “error”, r is the index of the smallest singular value σr that satisfies σ r Z ε, and Q r and Vr are the first r columns, from left to right, of the SVD resulting matrices. 2.5. Other factorizations Finding a low rank approximation of a matrix is a fundamental subject in applied mathematics, scientific computing and numerical analysis. If the Frobenius or 2-norm is used in the error calculation, the truncated singular value decomposition (TSVD) method can be used. However, in cases where the matrix is large, this technique may become too expensive. Therefore, some less expensive alternatives have been implemented. These solutions can be classified into deterministic and non-deterministic algorithms. For example, the former could use the Lanczos bidiagonalization process [17,18], while the latter use a Monte-Carlo based algorithm [19,20]. Also, the algorithms may or may not support out-of-core matrices. One factorization technique used to get low-rank approximations is the QR decomposition [10], and in particular the rank revealing QR decomposition (RRQR) [21,22]. Halko et al. [23] describe a set of randomized algorithms, for instance, the Randomized Subspace Iteration (RSI). Given a matrix A, these algorithms first use random sampling to construct a low-dimensional subspace that approximates the range of A. Then, the matrix is restricted to this subspace and a decomposition like QR or SVD is applied to the reduced matrix. In an experimental stage, these algorithms have shown to be simple but highly efficient. However, the need to pass over the whole matrix several times makes them inefficient for matrices that do not fit into fast memory. Finally, Aizenbud and Averbuch [24] describe the SubGaussian-based Randomized SVD Decomposition (SGRSVD). This method uses sub-Gaussian random matrices as sparse projections and the QR decomposition to find an orthonormal basis. The pseudoinverse algorithm is applied to compute an approximation to the SVD. 2.6. Space-filling curves and the Z-order curve In Computer Graphics, it is often possible to exploit the spatial coherence of scenes, in order to improve the efficiency of methods. For this purpose, several techniques to sort the elements of the scene by nearness have been developed. The use of space-filling curves can be a good solution to this problem. A space-filling curve is a function which maps multidimensional data to one dimension
1
2
3
4
5
6
7
8
Fig. 2. Sorting 2D triangles using the Z-order curve.
while preserving locality of the data points. Some examples are the Peano, Hilbert [25], and Z-order [11] curves. They define an indexing scheme that assign spatially local elements to local indexes. The Z-order curve, named after its “Z” shape, can be implemented with a few bitwise operations per index computation (see Fig. 1). Each dimension of the space is divided into 2N intervals. Therefore, each interval can be tagged with N bits, generating 22N quadrants in a 2D space. In this case, a quadrant is identified by 2N bits, computed by interleaving the bits of its dimensions. These identifiers follow the Z-order curve, and are assigned to the geometrical elements that belong to each quadrant. In this way, the elements are sorted by Z-order. Fig. 2 shows an example of sorting triangles in a 2D space by nearness using this strategy. The technique can be used in any N-dimensional space, though in this paper its 3D variant is used.
3. Method overview In this section, the HF method is proposed for the factorization of low-rank RF matrices. Our algorithm processes groups of columns of the matrix and extracts their principal components. Each group is generated dynamically, making the process suitable for out-of-core matrices. This approach works better with matrices that have a similarity coherence between near columns. Therefore, a method based on the Z-order curve is used. Using a divide-and-conquer strategy, the input matrix A is divided into two blocks of columns A1 and A2 such that ðA1 j A2 Þ ¼ A. These submatrices are factorized, combined and reduced using the SVD method. The technique is based on a recursive scheme, where a binary tree structure is formed (Fig. 3). Each node implies an SVD computation. The HF process results in two matrices Q and V, where QVT A.
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
4
A,ε
1. Partitioning
2. Factorization V
A1 A 2 Q
T
εp VkT
Qk A1,εch
A2,εch V1k’T Q1k’
V2k’’T Q2k’’
Fig. 3. Binary tree structure of the recursion. The input matrix A is divided into two matrices that are factorized separately. Then, the results are combined and another factorization is performed. This results in the outputs Q and V, such that ‖A QVT ‖2 r ε.
The base case of the recursion consists in applying the truncated SVD to each matrix on a leaf node to find a factorization at a given error εch. On the other hand, the recursive case consists in two recursive calls, one per block of columns. Then, the results of both child nodes are merged and another truncated SVD is executed at a given error εp. The values of εch and εp must be selected such that the error of the final factorization is bounded by ε. To explain the method, the recursive case needs to be defined. The approximation of the two child nodes A1 and A2 , which are Q 1 k0 V1 T0 and Q 1 k00 V1 T00 respectively, are joined and re-arranged in k k the following way:
Using Eqs. (7) and (9), the outputs Q and V of the recursive step are defined as: Q ¼ Q r ; VT ¼ VTr VTk ¼ VTr;1 V1 Tk0 j VTr;2 V2 T00 ð12Þ k
The algorithm applies the recursive case several times, which defines a tree structure. In the following sections the relation between the errors ε, εch and εp is analyzed, to find Q and V that comply with ‖A QVT ‖2 r ε for a given ε. Also, the row/column sorting strategy is explained and a pseudocode for the algorithm is presented. 3.1. Relation between norms
ð7Þ
0
00
where k ¼ k þ k . The columns of the matrix VTk are trivially orthonormal, because it is composed of two matrices with orthonormal columns. Also, the approximations to the matrices A1 and A2 are calculated recursively, such that Eq. (8) is satisfied for a given εch.
ε
‖A1 Q 1 k0 V1 Tk0 ‖2 r ch
and
ε
‖A2 Q 2 k00 V2 Tk00 ‖2 r ch
ð8Þ
Having Q k , TSVD is applied for a given εp, mainly because the columns of Q 1 k0 are not orthogonal with those of Q 2 k00 , and also because the number of columns could be reduced: ð9Þ Q r ; Vr ¼ TSVD Q k ; εp where
It can be proved that, for any matrix A ¼ ðA1 j A2 Þ, ‖A1 ‖2 þ ‖A2 ‖2 is an upper bound of ‖A‖2 . Nevertheless, here it is proved that qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi ‖A1 ‖22 þ ‖A2 ‖22 is also an upper bound of ‖A‖2 , which is a better bound because: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ‖A1 ‖22 þ ‖A2 ‖22 r ‖A1 ‖2 þ ‖A2 ‖2 Hence, as a first step on the error estimation, the following theorem is defined and proved for 2-norm. Theorem 1. Given any matrix A ¼ ðA1 j A2 Þ, it satisfies that qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ‖A‖2 r ‖A1 ‖22 þ ‖A2 ‖22 . Proof. By definition of 2-norm: ‖A‖2 ¼ sup ‖Av‖2 such that ‖v‖2 ¼ 1 Now, vmax is the orthonormal vector that reaches the supremum (for 2-norm vmax is the right singular vector associated to the σmax), so ‖A‖2 ¼ ‖Avmax ‖2 ¼ σ max
Q k Q r VTr Q r is a n r matrix and Vr is a k r matrix, and by construction of TSVD it holds that ‖Q k Q r VTr ‖2 r εp VTr
ð10Þ 0
00
is divided into its first k columns and its last k After that, columns: ð11Þ VTr ¼ VTr;1 j VTr;2
By construction, A also satisfies that: T ‖Avmax ‖2 ¼ ‖ðA1 j A2 Þ vTmax1 j vTmax2 ‖2 ¼ ‖A1 vmax1 þA2 vmax2 ‖2 ;
T where vmax ¼ vTmax1 j vTmax2
Then, using the triangle inequality, it can be stated that: ‖A‖2 ¼ ‖A1 vmax1 þ A2 vmax2 ‖2
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
r ‖A1 ‖2 ‖vmax1 ‖2 þ ‖A2 ‖2 ‖vmax2 ‖2
performs the factorization (HF), an auxiliary function (COMBINE), and the TSVD function.
Now, we introduce two useful results:
Algorithm 1. Hierarchical factorization.
1. If ‖ðvT1 j vT2 Þ‖22 ¼ 1, then ‖v1 ‖22 þ ‖v2 ‖22 ¼ 1. pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2 2. If a; b; x; y A R & x2 þ y2 ¼ 1, then ax þ by r a2 þ b . Therefore, applying them in the triangular inequality: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ‖A‖2 r ‖A1 ‖2 ‖vmax1 ‖2 þ ‖A2 ‖2 ‖vmax2 ‖2 r ‖A1 ‖22 þ ‖A2 ‖22
1: 2: 3: 4:
□
5: 3.2. The error of HF and its relation with
εch and εp
6: 7:
Given a matrix A and an error ε, the proposed method allows us to find Q and V such that ‖A QVT ‖2 r ε. To obtain this result, it is necessary to estimate the values of εch and εp. Based on Eqs. (7) and (9), the following equations are defined: A
¼ Q k VTk
þ ΔA;
Qk ¼
Q r VTr
þ ΔQ k
ð13Þ
ΔA can be divided into two blocks of columns ðΔA1 j ΔA2 Þ ¼ ΔA, which satisfy that: ΔA1 ¼ A1 Q 1 k0 V1 Tk0 and ΔA2 ¼ A2 Q 2 k00 V2 Tk00 By operating with Eqs. (12) and (13), it can be derived that: A ¼ QVT þ ΔQ k VTk þ ΔA
8: 9: 10: 11: 12: 13: 14:
function MAIN(mdl, ε, lvl) [mdl, n] ¼Z-ORDERSORT(mdl)
[Q , V, k]¼ HF(mdl, 1, n, ε, lvl) end function function [Q, V, k] ¼HF(mdl, c1, c2, ε, lvl) if lvl a 0 then ▹recursive case 0
½Q 1 k0 ; V1 k0 ; k ¼ HFðmdl; c1 ; ⌊c1 þ2 c2 c; 2pε ffiffi2 ; lvl 1Þ 00
½Q 2 k0 ; V2 k00 ; k ¼ HFðmdl; ⌊c1 þ2 c2 c þ 1; c2 ; 2pε ffiffi2 ; lvl 1Þ
0 00 ½Q ; V; k ¼ COMBINE ðQ 1 k0 ; V1 k0 ; Q 2 k00 ; V2 k00 ; k ; k ; 2ε Þ else ▹ base case RFc1 ‥c2 ¼ GENFORMFACTORS(mdl, c1, c2) ½Q ; V; k ¼ TSVDðRFc1 ‥c2 ; εÞ end if end function
15: function [Q r,V, r]¼ COMBINE ðQ 0 ; V1 0 ; Q 00 ; V2 00 ; k0 ; k00 ; εÞ 1k 2k k k 16: ½Q r ; Vr ; r ¼ TSVDððQ 1 k0 j Q 2 k00 Þ; εÞ ▹ Eq. (9) 0 17: Vr;1 ¼ Vr ð1 : k ; :Þ ▹ Eq. (11) 0
Now, by subtracting QVT from both sides and applying the triangular inequality, we obtain:
18:
‖A QVT ‖2 r ‖ΔQ k VTk ‖2 þ ‖ΔA‖2
20: end function
Because Vk is orthonormal, ‖ΔQ k VTk ‖2 ¼ ‖ΔQ k ‖2 , and due to qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Theorem 1, ‖ΔA‖2 r ‖ΔA1 ‖22 þ ‖ΔA2 ‖22 , therefore the above inequality can be transformed into: qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ‖A QVT ‖2 r ‖ΔQ k ‖2 þ ‖ΔA1 ‖22 þ ‖ΔA2 ‖22 Finally, substituting the norms by Eqs. (8) and (10) it can be established that: pffiffiffi ‖A QVT ‖2 r εp þ 2εch Then, given an error ε, any pair εch and εp that satisfies Eq. (14) allows us to find Q and V such that ‖A QVT ‖2 r ε. pffiffiffi εp þ 2εch r ε ð14Þ There is not a unique set of εch and εp values that satisfies the inequality. Eq. (15) is used here, where ε is equally distributed in both terms of Eq. (14):
ε εch ¼ pffiffiffi; 2 2
εp ¼
ε 2
5
ð15Þ
3.3. Row/column sorting The sorting of rows/columns by similarity in the matrix accelerates the factorization method, because the intermediate matrices tend to have smaller ranks on each level of the recursion. In this work, a 3-dimensional Z-order curve [11] is used to sort the patches of the scene by nearness. In Section 4 it is shown that this method is efficient for sorting the rows/columns in order to speedup the factorization method.
19:
0
00
Vr;2 ¼ Vr ðk þ 1 : k þ k ; :Þ V¼
ðVTr;1 V1 T0 j VTr;2 V2 T00 Þ k k
▹ Eq. (11) ▹ Eq. (12)
21: function [Q ,V, r] ¼TSVD(A, ε) 22: [U, D, V] ¼SVD(A) 23: r ¼ maxi ðσ i Z εÞ ▹ #columns of Q and V 24: Q ¼U(:,1:r)D(1:r,1:r) 25: V ¼V(:,1:r) 26: end function
In the MAIN routine (lines 1–4), the input data are the geometric model (mdl), the expected error (ε), and the number of levels used in the recursion (lvl). The patches of the scene are sorted following the Z-order curve, and HF is invoked. The function HF (lines 5–14) implements Eqs. (7), (9), (11) and (12). Its inputs are mdl, 2 column indexes (c1 and c2), ε, and lvl. This procedure finds a factorization of the matrix formed by the columns c1..c2 of the form factors matrix. It follows a divide-andconquer strategy, dividing that matrix in two blocks of columns and invoking recursive calls to factorize them (lines 7–8, see Eq. (7)). After that, the COMBINE function is called (line 9). This function assembles and reduces the factorization of the split matrix (lines 15–20, see Eqs. (9), (11) and (12)). In the calls to HF and COMBINE functions, the values of ε satisfy Eq. (15). In the base case (lines 11–12), the procedure GENFORMFACTORS takes the geometric model as input and generates the form factors associated to the columns c1 ‥c2 of the RF matrix. After that, TSVD (lines 21–26) is called to factorize it. As can be appreciated, the RF matrix is generated by parts, which allows us to work with matrices that are bigger than system memory.
3.4. HF pseudocode
4. Experimental analysis
In order to clarify some concepts and to provide implementation details, Algorithm 1 presents the pseudocode of the proposed method. Four functions are described: the MAIN routine, the one that
In order to analyze our proposal, the algorithm was implemented using MATLAB and the SVD function presented by Vijayan [26]. Note that, for simplicity reasons, the only parallelism used is
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
6
Fig. 4. Cornell box subdivided into 640 triangles.
the MATLAB native multi-threading, leaving other possible optimizations for future works. All tests were performed on an Intel i7 processor along with 16GB of RAM memory. The form factors are computed using the hemi-cube technique [9], implemented with OpenGL and CUDA, and executed on a Geforce GTX 780 with 4GB of RAM. The matrices are generated dynamically on each leaf of the recursive tree, allowing to process matrices that cannot be allocated in system memory. 4.1. Algorithm calibration In this section, we study the impact on the algorithm of parameters such as the number of levels in the recursion and ε. For this purpose, the radiosity matrix corresponding to the Cornell Box scene is used (Fig. 4), with different number of patches. The number of patches used are 2560, 10,240, 40,960, and 163,840. The execution times needed to generate all the column blocks of F are 3.76 s, 13.8 s, 61.0 s, and 369 s respectively. 4.1.1. Levels of the recursion There is a relation between the recursion levels and the execution times, for a given ε. Moreover, this relation also depends on the scene and its number of patches. Fig. 5 presents the execution times of the algorithm for the Cornell Box scene with different number of patches and different ε. In the case of 0 levels (where just TSVD is applied), the results do not depend on ε. This happens because the implementation of the TSVD technique does not take advantage of ε to speed up the process. Also, the optimal number of levels depends on the expected error. 4.1.2. Z-order curve technique Next, a test to show the usefulness of the Z-order curve technique (described in Section 3.3) is presented. It is important to say that this technique is a linear process, and that its execution times are very small when compared to the factorization time, for every size. Table 2 shows the execution times, obtained errors and ranks for the algorithm with and without the use of this technique. All the results were executed using the optimal number of levels in the recursion. As can be seen, the use of Z-order curve makes the factorization code more efficient for large matrices, reaching a speedup up to 2x. On the other hand, the real ε displayed corresponds to the 2norm of the difference between the original and factorized matrix; for size n¼ 163,840, ε could not be calculated due to memory reasons. In practice, the error converges roughly to ε=2 when the
Fig. 5. Execution times for the Cornell Box scene with different number of patches.
Table 2 Execution times and obtained error for ε¼ 0.1, with and without the use of Z-order curve. Size
No Z-order Real ε
Time (s) 2560 10,240 40,960 163,840
Z-order
0
1.31 10 8.24 100 2.60 102 4.65 104
0.0501 0.0505 0.0500 –
Rank 416 874 1786 3981
Time (s) 1
2.57 10 5.67 100 1.41 102 2.32 104
Real ε
Rank
0.0500 0.0498 0.0496 –
416 877 1791 3990
Fig. 6. Execution times for the Cornell Box with 10,240 patches.
scene has a good spatial coherence. The experiments lead us to think that a different scheme can be used, where Eq. (14) is not satisfied, and yet get a bounded error. This happens because the
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
spaces defined by the columns of Q 1 k0 and Q 2 k00 are often similar. More work is needed to take advantage of this aspect. Fig. 6 shows the execution times for the Cornell box scene with 10,240 patches and different recursion levels. The results are presented with and without the use of the Z-order curve technique. Besides the fact that both strategies show its optimum at 3 levels, better execution times are obtained when the row/column sort is applied. 4.1.3. Matrix rank and memory consumption Since one of the purposes of this work is to reduce the memory size of the input matrix, it is important to study the relation between the initial dimensions and the obtained ranks. This helps us to show that the proposed algorithm can take advantage of the low numerical rank of the matrices and, therefore, be utilized in radiosity calculations. Fig. 7 presents the relation between the number of patches and the obtained rank for a Cornell Box scene and different errors. On the other hand, it is important to study the memory consumption throughout the execution of the algorithm. This determines the memory requirements to perform the factorization, which are usually bigger than its final memory size. Table 3 shows the memory size of the resultant factorizations for the Cornell Box scene with different number of patches and ε ¼ 0:1, as well as the memory peak reached during the process. The memory peak
Fig. 7. Relation between #patches and rank for the Cornell Box.
7
depends on the number of levels used in the recursion. For each matrix, the results are reported using the number of levels that produce shorter execution times. The experiment shows that the memory reduction is significant when comparing the size of the full matrix and the size of the factorization. For all the test cases, the factorization is small enough to fit in the system memory of a regular PC. The memory peaks are reached during the execution of the SVD routine, and are always higher than the final size of the factorization. In spite of this, every test case was successfully executed without exceeding the memory limits (16 GB). 4.2. Comparison with other algorithms In order to compare the proposed algorithm with other existing methods, we studied several factorization techniques that compute a low-rank approximation of a given matrix. In this stage, two algorithms stood out both in execution time and precision of the resultant factorization. The first method is the Randomized Subspace Iteration, presented by Halko [23]. It offers an iterative solution to find a basis that approximates most of the action of the input matrix to factorize. Each iteration derives into more precise solutions, but each one passes 2 times over the matrix, slowing the total execution time. The second technique is the Sub-Gaussian-based Randomized SVD Decomposition (SGRSVD), presented by Aizenbud and Averbuch [24], which uses sub-Gaussian random matrices to compute an approximated truncated SVD. This method requires only two passes over the input matrix. These two algorithms were used to factorize the form factors matrix of the Cornell Box scene for different sizes. Table 4 presents these results along with the proposed method, and the regular SVD routine. The speedups of HF over the other methods and the obtained errors are reported. Our technique works faster than the rest and its accuracy is very near the optimal (SVD). Also, the SGRSVD method is the slowest for all test cases. The out-of-core matrix used corresponds to the model with n¼ 163,840. For this model, HF works faster than RSI and SGRSVD algorithms, which need more passes over the input matrix than ours. 4.3. Real-time radiosity results
Table 3 Memory consumption for the Cornell Box and ε ¼ 0:1. Size
k
Full matrix memory Factorization memory size size
Process memory peak
2560 10,240 40,960 163,840
416 874 1786 3981
25.0 MB 400 MB 6.25 GB 100 GB
184 MB 209 MB 820 MB 7.55 GB
8.12 MB 68.3 MB 558 MB 4.86 GB
It is important to analyze the results of the proposed algorithm when applied to solve the radiosity (B) of a scene. To solve radiosity, we apply the factorization results Q and V on Eq. (4), where Uk ¼ Q and VTk ¼ VT . Two more scenes were used. Fig. 8 presents the Patio scene composed of 21,886 patches, while Fig. 9 shows the Sponza Atrium, composed of 79,230 patches. Computing the elements of the F matrices takes about 30 and 130 s respectively.
Table 4 Results for factorizing the Cornell Box form factors matrix, using the three algorithms to be compared. The speedups of HF over the other algorithms and the obtained errors are reported. Size
k
HF
RSI
SGRSVD
SVD
Time (s)
Real ε
Speedup
Real ε
Speedup
Real ε
Speedup
Real ε
2560
416
2:57 10 1
0.0500
1.11
0.0570
2.83
0.0513
11.3
0.0500
10,240
877
5:67 100
0.0505
1.25
0.0586
1.60
0.0872
28.8
0.0504
40,960
1791
1:41 102
0.0500
1.38
0.0600
7.66
0.0986
39.6
0.0500
163,840
3990
2:32 104
–
1.45
–
15.23
–
–
–
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
8
Table 5 Execution times for computing radiosity of the Patio and Sponza atrium, with different dimensions.
Fig. 8. Patio.
Scene
k
fact. (s)
Yk (s)
B (s)
fps
Patio
3920 3114 2227 1617
97 71 58 49
7.42 4.70 2.45 1.29
0.040 0.031 0.024 0.016
25.3 32.6 41.5 62.1
Sponza
17,857 3483 1751 376
19,225 2023 1987 1524
329.42 19.66 5.25 0.33
0.921 0.128 0.067 0.015
1.1 7.8 14.8 66.7
Table 6 Relative error for radiosity calculations using different dimensions. Mean, standard deviation, and maximum value of the results for 500 different emissions are reported. ε
0.05 0.1 0.2 0.3 0.5
Patio (n ¼21,888) ‖B~ B‖=‖B‖
Cornell Box (n¼ 2560) ‖B~ B‖=‖B‖
k
μ
σ
Max
k
μ
σ
Max
3114 3920 2227 1617 932
0.0025 0.0048 0.0096 0.0142 0.0232
0.0013 0.0030 0.0073 0.0124 0.0224
0.0073 0.0170 0.0395 0.0635 0.1205
560 416 328 274 139
0.0021 0.0045 0.0088 0.0133 0.0255
0.0017 0.0042 0.0075 0.0122 0.0299
0.0094 0.0213 0.0448 0.0839 0.1713
illumination looks more real and the memory cost increases for smaller values of ε.
5. Conclusions and future work
Fig. 9. Sponza Atrium.
The obtained execution times and speed (in Frames per Second) for solving radiosity are presented in Table 5. The precomputation times (calculation of Q and V by factorization, and Y k ) are much higher than the B calculation times, which for some cases are shorter enough for a Real-Time execution ð 420 fpsÞ. This result makes the method suitable for problems where the geometry is static and the emission changes, like in video games or inverse lighting problems [8]. Table 6 shows the radiosity relative error results for different dimensions on the Patio and Cornell Box scenes. The different factorizations were tested for radiosity (using Eq. (4) to calculate ~ with 500 random emission vectors E, where only one patch is B) selected as emitter in each test. The exact radiosity B is calculated using Eq. (1). The mean, standard deviation and maximum values are reported. The minimum value is near 0 for every case. This shows that for these scenes, the relative errors in the radiosity results are smaller than the value of ε defined for the approximation of RF. Fig. 10 shows radiosity results for the Sponza atrium scene. It is important to highlight that the relative error is not reported for this scene because the size of the matrix is too large (the exact radiosity B was not calculated). Two examples of the same scene are shown, calculated with different ε values. The
The present work proposes a hierarchical technique to factorize low numerical rank matrices, applying multiple decompositions to sections of the matrix and, then, joining and reducing the results. Moreover, the sorting of rows/columns of the matrix by similarity can be used to accelerate the process. This one-pass algorithm works faster than the other studied methods for medium-to-large sized problems, and allows us to work with matrices that do not fit into system memory. The proposed method is used to solve the radiosity problem efficiently for static scenes by factorizing the matrix RF. Furthermore, the Z-order curve is used to sort the patches of the scene, which allows us to exploit its spatial coherence. As a result, it is possible to calculate in real-time the global radiosity (with infinite bounces) for multiple scenes (2560, 21,888, and 79,232 patches) with negligible relative error. Regarding future work, the application of the proposed method for diverse areas of computer graphics, such as inverse lighting problems and Computer Aided Design, should be evaluated. On the other hand, it is necessary to continue the study and development of the factorization algorithm. For instance, a better estimation of the error bound should be found, where the spatial coherence of the scene could be taken into consideration. Also, different base factorization techniques (like the RRQR decomposition) could be used. Regarding the implementation, several optimizations are pending, for example the use of multiple threads to execute each leaf decomposition in parallel. Besides, specialized hardware such as GPUs could be used to perform a faster SVD or matrix-vector and matrix-matrix operations.
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i
J.P. Aguerre, E. Fernández / Computers & Graphics ∎ (∎∎∎∎) ∎∎∎–∎∎∎
9
References
Fig. 10. Radiosity results for the Sponza Atrium.
Acknowledgements The work was supported by project FSE_1_2014_1_102344 from Agencia Nacional de Investigación e Innovación (ANII, Uruguay).
[1] Dutre P, Bekaert P, Bala K. Advanced global illumination. CRC Press; 2006. [2] Cohen MF, Chen SE, Wallace JR, Greenberg DP. A progressive refinement approach to fast radiosity image generation. In: ACM SIGGRAPH computer graphics, vol. 22. Atlanta, Georgia: ACM; 1988, p. 75–84. [3] Hanrahan P, Salzman D, Aupperle L, A rapid hierarchical radiosity algorithm. In: ACM SIGGRAPH Computer Graphics, vol. 25. Las Vegas, Nevada: ACM; 1991. p. 197–206. [4] Keller A. Instant radiosity. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques. SIGGRAPH '97; New York, NY, USA: ACM Press/Addison-Wesley Publishing Co.; 1997. p. 49–56. [5] Sloan PP, Kautz J, Snyder J. Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments. In: ACM Transactions on Graphics (TOG), vol. 21. New York: ACM; 2002. p. 527–36. [6] Wang R, Wang R, Zhou K, Pan M, Bao H. An efficient GPU-based approach for interactive global illumination. In: ACM Transactions on Graphics (TOG), vol. 28. New York: ACM; 2009. p. 91. [7] Fernández, E. Low-rank radiosity. In: Proceedings of the IV Iberoamerican symposium in computer graphics. Margarita Island, Venezuela: Sociedad Venezolana de Computación Gráfica; 2009. p. 55–62. [8] Fernández E, Besuievsky G. Inverse lighting design for interior buildings integrating natural and artificial sources. Comput Graph 2012;36(8):1096–108. [9] Cohen MF, Greenberg DP, The hemi-cube: A radiosity solution for complex environments. In: ACM SIGGRAPH Computer Graphics, vol. 19. San Francisco, California: ACM; 1985. p. 31–40. [10] Golub GH, Van Loan CF. Matrix computations, vol. 3. Baltimore, Maryland: JHU Press; 2012. [11] Morton GM. A computer oriented geodetic data base and a new technique in file sequencing. New York: International Business Machines Company; 1966. [12] Cohen M, Wallace J, Hanrahan P. Radiosity and realistic image synthesis. San Diego, CA, USA: Academic Press Professional, Inc; 1993. [13] Baranoski GV, Bramley R, Rokne JG, Eigen-analysis for radiosity systems. In: Proceedings of the Sixth International Conference on Computational Graphics and Visualization (Compugraphics '97). Citeseer; 1997. p. 193–201. [14] Ashdown I. Eigenvector radiosity. [Master's thesis], Department of Computer Science, University of British Columbia; Vancouver, British Columbia; 2001. [15] Hasan M, Pellacini F, Bala K. Matrix row-column sampling for the many-light problem. ACM transactions on graphics, proceedings of SIGGRAPH 2007, vol. 26(3); 2007. [16] Eckart C, Young G. The approximation of one matrix by another of lower rank. Psychometrika 1936;1(3):211–8. [17] Millhauser GL, Carter AA, Schneider DJ, Freed JH, Oswald RE. Rapid singular value decomposition for time-domain analysis of magnetic resonance signals by use of the lanczos algorithm. J Magn Reson (1969) 1989;82(1):150–5. [18] Simon HD, Zha H. Low-rank matrix approximation using the Lanczos bidiagonalization process with applications. SIAM J Sci Comput 2000;21(6):2257–74. [19] Frieze A, Kannan R, Vempala S. Fast Monte-Carlo algorithms for finding lowrank approximations. J ACM 2004;51(6):1025–41. [20] Drineas P, Kannan R, Mahoney MW. Fast Monte Carlo algorithms for matrices ii: Computing a low-rank approximation to a matrix. SIAM J Comput 2006;36 (1):158–83. [21] Gu M, Eisenstat SC. Efficient algorithms for computing a strong rank-revealing QR factorization. SIAM J Sci Comput 1996;17(4):848–69. [22] Bischof CH, Quintana-Orti G. Computing rank-revealing qr factorizations of dense matrices. ACM Trans Math Softw 1998;24(2):226–53. [23] Halko N, Martinsson PG, Tropp JA. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 2011;53(2):217–88. [24] Aizenbud Y, Averbuch A, Matrix decompositions using sub-Gaussian random matrices. Manuscript submitted for publication 2016; URL 〈http://www.cs.tau. ac.il/amir1/publications.html〉:Accesed: May 2016. [25] Voorhies D. Space-filling curves and a measure of coherence. Graph Gems II 1991:26–30. [26] Vijayan V. Fast SVD and PCA. URL 〈http://www.mathworks.com/matlabcentral/ fileexchange/47132-fast-svd-and-pca/content/svdecon.m〉; 2014. Accessed: May 2016.
Please cite this article as: Aguerre JP, Fernández E. A hierarchical factorization method for efficient radiosity calculations. Comput Graph (2016), http://dx.doi.org/10.1016/j.cag.2016.08.003i