Constructing a 3D trunk model from two images

Constructing a 3D trunk model from two images

Graphical Models 69 (2007) 33–56 www.elsevier.com/locate/gmod Constructing a 3D trunk model from two images Chin-Hung Teng a a,* , Yung-Sheng Chen ...

4MB Sizes 0 Downloads 59 Views

Graphical Models 69 (2007) 33–56 www.elsevier.com/locate/gmod

Constructing a 3D trunk model from two images Chin-Hung Teng a

a,*

, Yung-Sheng Chen b, Wen-Hsing Hsu

a

Department of Electrical Engineering, National Tsing Hua University, Hsinchu 30013, Taiwan b Department of Electrical Engineering, Yuan Ze University, Chung-Li 320, Taiwan Received 17 February 2005; received in revised form 3 March 2006; accepted 7 June 2006 Available online 8 August 2006

Abstract Trees stand for a key component in the natural environment, thus modeling realistic trees has received much attentions of researchers in computer graphics. However, most trees in computer graphics are generated according to some procedural rules in conjunction with some random perturbations, thus they are generally different from the real trees in the natural environment. In this paper, we propose a systematic approach to create a 3D trunk graphical model from two images so that the created trunk has a similar 3D trunk structure to the real one. In the proposed system, the trunk is first segmented from the image via an interactive segmentation tool and its skeleton is then extracted. Some points on the skeleton are selected and their context relations are established for representing the 2D trunk structure. A camera self-calibration algorithm appropriate for the two-view case is developed, and a minimum curvature constraint is employed to recover the 3D trunk skeleton from the established 2D trunk structure and the calibrated camera. The trunk is then modeled by a set of generalized cylinders around the recovered 3D trunk skeleton. A polygonal mesh representing the trunk is finally generated and a textured 3D trunk model is also produced by mapping the image onto the surface of the 3D trunk model. We have conducted some experiments and the results demonstrated that the proposed system can actually yield a visually plausible 3D trunk model which is similar to the real one in the image.  2006 Elsevier Inc. All rights reserved. Keywords: Tree; Tree rendering; Tree modeling; 3D trunk model; Minimum curvature constraint; Camera self-calibration

1. Introduction Creating realistic virtual environments is the dream of many people working in computer science. Imagine that we are visiting a famous park in the other end of earth. We are roaming in the alley and enjoying the beauty of the natural scenery. If we have such a virtual environment, we can experi-

*

Corresponding author. E-mail address: [email protected] (C.-H. Teng).

ence all of these without traveling there. Obviously, simulating a 3D natural scene is quite important for such a scenario. Since trees are very common objects in the natural environment, creating realistic 3D tree models is necessary for constructing such a 3D natural scene. In fact, tree modeling has been a topic in computer graphics for many years, and in this area, the 3D model of a tree is typically constructed by first generating the skeleton of the trunk according to some procedural rules and then modeling the trunk by some simple graphical primitives such as truncated cones or generalized cylinders

1524-0703/$ - see front matter  2006 Elsevier Inc. All rights reserved. doi:10.1016/j.gmod.2006.06.001

34

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

[1]. The leaves are then grown and pruned to produce a more natural global appearance. Some researchers also considered botanical principles in their algorithm to create trees faithfully reflecting the natural phenomena [2]. Today, tree modeling is well developed and the created trees have a quite realistic visual exhibition (e.g. [3]). Generally, trees in computer graphics are synthesized by some algorithmic rules plus some random variations, thus they are different from the actual trees in the environment, i.e., the trees are grown themselves without any reference to the actual trees in the nature. In order to generate a tree similar to a specific one in the real world, some approaches [3–5] had been developed allowing the user to manipulate the created model so that a desired tree could be generated. However, this is generally a tedious work and often requires an experienced user to accomplish it. To create a tree similar to the real one while reducing user’s efforts, a method which can automatically extract the 3D information of the trees from the images should be developed. Shlyakhter et al. [6] developed a system for reconstructing 3D tree models from instrumented photographs. In this system, the trees are first manually segmented from the images and the visual hull of the tree is then constructed. The 3D skeleton of the tree is extracted from the visual hull and the details of the tree are finally modeled by the L-systems [7]. This system seems to have a quite well visual result, but it often requires several images (typically greater than four) and needs much computational power in visual hull construction and 3D skeleton extraction. In this paper, we will approach the construction of 3D trunk models from a different point of view, namely, the techniques of structure from motion. Structure from motion is a long-lasting research topic in computer vision. It can recover the 3D structure of objects from the correspondences in several views of the objects. Typically, the success of the techniques of structure from motion hinges on how accurate the correspondences are identified. Unfortunately, searching correspondences is a rather difficult task for tree images especially for the leaf regions. There are severe occlusions between leaves which will prevent the exact 3D position and orientation of each leaf from being recovered. Hence, the techniques in structure from motion will generally fail for reconstructing an exact 3D model of a tree. However, for many applications of 3D scene reconstruction, we do not require the exact 3D recovery of each leaf and branch. A 3D tree model with sim-

ilar 3D trunk structure and global leaf appearance to the real tree is enough. This motivates us to combine graphical methods with the techniques in structure from motion to construct a 3D tree model that resembles the actual tree in the image. Namely, we can first recover the 3D trunk skeleton by employing the techniques in structure from motion, then model the trunk using generalized cylinders and finally generate leaves using graphical methods. The leaves should be pruned according to the image so that the resulting tree has a similar global appearance to the real one. In this paper, we aim at developing a system for constructing a 3D trunk model from two slightly different tree images. Generally, more views can produce more accurate 3D recovery, but require more efforts in searching correspondences. We must deal with the appearance and disappearance of corresponding points in multiple views. Using only two views can greatly simplify the handling of these corresponding points and save the computational power in searching correspondences in multiple views. In addition, to facilitate the search of correspondences, we also demand the two images to be only slightly different. In the proposed system, the trunk is first extracted from the image. Since object segmentation is still an open problem in the field of computer vision, our system is semi-automatic and requires some manual adjustments to correctly extract the trunk from the image. We utilize an interactive segmentation tool to extract the trunk from the image. After retrieving the trunk, the 2D trunk skeleton is extracted and some points on the skeleton are selected. These selected points are then connected to represent the 2D trunk structure which can explicitly indicate the 3D trunk–branch relation of the tree. In this paper, we develop a camera selfcalibration algorithm suitable for the two-view case to acquire camera parameters. The 3D trunk skeleton is recovered from the calibrated camera and the established 2D trunk structure. Since we demand the two images to be only slightly different, the problem of reconstruction uncertainty is inevitable. To improve estimation accuracy, we include a minimum curvature constraint in the computation of 3D trunk points to produce a more accurate 3D trunk skeleton. The 3D trunk model is then constructed by generating a set of generalized cylinders around the recovered 3D trunk skeleton and a textured 3D trunk model can also be generated by mapping the image onto the constructed 3D trunk model. In summary, the main contributions of this

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

paper are: (1) a complete framework for constructing a 3D trunk model from merely two slightly different images is presented. (2) A camera self-calibration algorithm suitable for the two-view case is introduced. (3) A minimum curvature constraint alleviating the reconstruction uncertainty to produce an accurate 3D trunk skeleton is developed. In the following, the detailed formulation of proposed system will be described and some experimental results will be given. The remainder of this paper is organized as follows. In the next section, more complete surveys for plants modeling, rendering, and camera calibration are given. In Section 3, we introduce how to create a 3D trunk model from two images. This contains trunk extraction, 2D trunk structure establishing, camera self-calibration, 3D trunk skeleton recovery, and 3D trunk model building. Some experiments are conducted and the results are discussed in Section 4. Finally, this paper is ended with conclusion and future work. 2. Related work 2.1. Plant modeling and rendering Modeling realistic plants represents a great challenge for people working in computer graphics. In fact, modeling tree is initially motivated by biologists who wanted to investigate the growth of plants. The formal exploration of plants growth began in 1968 by Lindenmayer [7] who developed the well-known L-systems for cell interaction. This system was later developed and enhanced by Smith [8] and Prusinkiewicz and Lindenmayer [9,10] in computer graphics for plant modeling and development. The basic concept behind L-systems is the parallel application of grammar rules of a rewriting system to generate a string of symbols. These symbols can be selected to represent the botanic entities such as leaves and flowers or to specify the branch sections and structural parameters such as branch angles. L-systems can also be extended to simulate some phenomena such as pruning [11] and the interaction of plants with their environment [12]. Because of their extensive applications, L-systems have become the most important models for plant generating and are widely employed in many modeling systems. In additional to L-systems, other models for plant rendering were also developed. Reeves [13] employed a structured particle system to generate trees and grass-covered forest floor. He modeled

35

the tree by starting from the main trunk and generating the subbranches recursively. In his model, tree is constructed by a set of line segments and circles representing the branches and leaves. Oppenheimer [14] presented a fractal model for branching objects. He used several parameters to model the tree such as the angle between the main stem and the branches, the size ratio of the main stem to the branches, the rate at which the stem tapers, and so on. Some random perturbations were also imposed on these parameters to avoid auto-similarity. Following these models, more algorithms concentrating on different aspects of plant modeling and rendering were also developed. Bloomenthal [1] modeled the limbs with a set of points and connections and utilized generalized cylinders to represent the surface of the trunk. de Reffye et al. [2] integrated botanical knowledge of the architecture of trees or plants into their model so that the created trees were faithful to botanical structure and development. Instead of stressing on botanical knowledge, Weber and Penn [15] used a small number of parameters to generate a wide variety of complex realistic trees. This model is not strict adherence to botanical principles but can be easily used by the users who are not master of the principles beyond basic geometry. Additionally, some methods for animating plants motion and development and the interaction of natural environment were also put forward [12,16–18]. To create more realistic trees, some researchers [3–5] developed interactive methods allowing the user to manipulate the plant structure. Lefebvre and Neyret [19] proposed an approach for bark generation to improve the realism of generated trees. To reduce computational power while preserving realism, Lluch et al. [20] built a multiresolution representation for plants and trees. A system for reconstructing the 3D tree models from instrumented photographs was also proposed by Shlyakhter et al. [6]. Recently, some plant ecosystems were also developed [21,22]. There are many plants in the system and they can interact with others and with the environment. With this ecosystem, a quite realistic scene can then be generated. 2.2. Camera calibration Camera calibration plays an important role in scene 3D structure recovery. In early periods, camera calibration was a central topic in the field of photogrammetry, where the camera was calibrated so that an accurate map could be generated [23].

36

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

In the field of computer vision, camera calibration is usually achieved via some specific calibration patterns in conjunction with some geometric entities of multiple views. Some well-known calibration methods of this type can be found in [24–27]. This type of calibration is referred to as off-line calibration since they were usually proceeded in laboratory and required some calibration objects. On the other hand, camera can also be calibrated on-line and the process of acquiring camera parameters is usually termed as camera self-calibration. For self-calibration, both the scene and camera structures are initially unknown and the camera parameters, including internal and external, are estimated only from the image correspondences. To support camera calibration, multiple-view geometries were also deeply explored [28]. The classes of multiple-view geometry could be roughly classified as two-view, three-view and multiple-view cases. Attached to the two-view geometry, which is usually referred to as epipolar geometry, is the well-known fundamental matrix. Because of its importance, many methods for estimating fundamental matrix were proposed, and a summary and performance evaluation of these methods can be found in [29,30]. Faugeras et al. [31,32] first put forward a method for camera self-calibration by utilizing the epipolar geometry and the so-called Kruppa equations. They demonstrated that each pair of views can provide two constraints on the five unknown camera parameters, thus these camera parameters can be estimated with sufficient number of views if the camera parameters are fixed for these views. An alternative approach was proposed by Hartley [33] where he first established the projective reconstruction of the scene, then computed the quasi-affine reconstruction and utilized it to compute the calibration matrix, and finally refined the results using bundle adjustment. Some researchers also reconstructed 3D structure and calibrated the camera by exploiting the three-view geometry or the so-called trifocal tensor. Armstrong et al. [34] utilized the trifocal tensor to calibrate the camera with fixed camera parameters undergoing planar motion. Recently, more improved approaches to camera self-calibration were also developed. Triggs [35] employed the absolute quadric to calibrate the camera and obtained a metric 3D structure. Luong and Faugeras [36] showed that only correspondences between three images and the fundamental matrices computed from these correspondences are sufficient to recover the internal and external parameters of the

camera. Previous techniques usually assumed fixed camera parameters during the capture of images. Bougnoux [37] proposed a calibration method that allows varying camera parameters. He derived a closed-form formula from Kruppa equation so that the focal length can be directly computed from the fundamental matrix, epipoles and principal points. He then utilized the obtained focal length as an initialization to further refine the results and recover the 3D scene structure. Pollefeys [38,39] employed a stratified method to achieve camera self-calibration and metric reconstruction. He developed a modulus constraint [40] to recover scene structure to affine stratum, and then upgraded it to the metric stratum. He also discussed the possibility of relaxing camera constraints to allow varying and unknown internal camera parameters in [39,41]. The approaches to camera calibration under restricted camera motions were also reported in the literature. One can refer to [28,39] for further understanding the development of camera self-calibration, metric 3D reconstruction, and multiple-view geometry. 3. Proposed system 3.1. System overview The proposed system consists of several modules as depicted in Fig. 1. Two images are fed into this system with the first image the primary and second the auxiliary. The auxiliary image is merely used for locating corresponding points in the modules of Camera Self-Calibration and 3D Trunk Skeleton Recovery. Only the primary image is used in the module of Trunk Extraction. This module extracts the trunk and transfers the result to the module of 2D Trunk Structure Establishing. In the module of 2D Trunk Structure Establishing, the skeleton of the trunk is extracted first and then the skeleton image is partitioned into a number of small blocks called trunk elements. Within each trunk element, the central skeleton point is selected and is referred to as the 2D trunk point throughout this paper. These trunk elements are then connected according to the actual context relations of the trunk. These connected trunk elements represent the 2D trunk structure of the tree. This can be seen in the leftmiddle part of Fig. 1, where an enlarged picture showing several connected trunk elements is displayed. The 2D trunk structure will be exploited in subsequent modules to recover the 3D trunk skeleton and create a 3D trunk model.

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

37

Fig. 1. System diagram of proposed system for building 3D trunk models. An example and some intermediate results are also shown in this figure. On the left side from top to bottom are the extracted trunk, portion of enlarged 2D trunk structure, and the recovered 3D trunk points. On the right-hand side are the input images and the textured 3D trunk model.

Camera Self-Calibration plays an important role in the proposed system. In this paper, we present a self-calibration method appropriate for the twoview case to estimate camera parameters. A number of corner points are selected in the primary image and their corresponding points are identified in the auxiliary image. We calibrate the camera by searching the camera intrinsic parameters (K) and camera pose (R, t) such that the sum of squared distances between the measured and reprojected image points is minimized. The estimated camera parameters (K, R, t) are then passed to the module of 3D Trunk Skeleton Recovery. In the module of 3D Trunk Skeleton Recovery, the corresponding point for each 2D trunk point is searched via the epipolar line in the auxiliary image and its 3D position is recovered by the calibrated camera. In this paper, these recovered 3D points are termed as 3D trunk points in contrast to the 2D trunk points. These recovered 3D trunk points in conjunction with previously established 2D trunk structure constitute the 3D trunk skeleton of the tree. The 3D trunk skeleton is utilized in the final module of 3D Trunk Model Building to create a polygonal mesh of the trunk. The

trunk texture is then mapped onto the polygonal mesh and a realistic 3D trunk model is generated as revealed in Fig. 1. Throughout the rest of the paper, each module of proposed system will be described and the test images shown in Fig. 1 will be used for illustration. More examples will be given in Section 4. 3.2. Extracting trunk Generally, extracting trunk from a single image is a rather difficult task. There are thousands species of tree in the world with each one having very different shape and appearance. Moreover, because of the variation of environment, trees will exhibit quite different colors at different time. For instance, the color of a tree has very different presentation on the sunny and rainy day. Complex background is another problem for extracting trunk from an image. There are some objects similar to the trunk which will complicate the extraction of trunk. Because of these difficulties, it is quite difficult to develop a robust algorithm to extract trunk from an image automatically under various situations.

38

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

Therefore, in our system we utilize an interactive segmentation tool to extract trunk from the image. We will provide three interactive methods and give a briefly qualitative comparison of these methods in this section. Perhaps the simplest interactive method for extracting trunk is to first segment the image using a low-level segmentation algorithm such as k-means or EM (Expectation-Maximization) algorithm [42,43], and then select the trunk from the results. In addition to color, we can also include the spatial position and texture as our segmentation features to have a better segmentation result. In our experience, EM algorithm can produce quite well image segmentation, but it is also a time-consuming method. The time-consuming problem can be relieved by simplify the model parameters of the EM algorithm, thus speed up the segmentation process. For example, we typically assume the model for generating the image features as mixture Gaussian, thus the covariance matrix can be simplified from a full matrix to a diagonal matrix, and extremely to a identity matrix. If the prior probabilities are also the same, the EM algorithm can then be reduced to the well-known k-means algorithm. The computation time is greatly reduced while the performance is somewhat degraded. Generally, this method requires an interactive refinement process to manually erase or include some image pixels so that a distinct object boundary can be produced. Clearly, for the simplified model with degraded performance, more user’s efforts are required to amend the segmentation results. This method is intuitive and easy to implement (e.g., the k-means algorithm) but requires much user labor and computation time. As the progress of technology, recently some interactive methods have been proposed which can separate the foreground and background within reasonable time and require less user’s efforts. Lazy Snapping [44] and GrabCut [45], which are both based on graph cut theory, are two interactive tools that can be used to cut the desired object from the image. In Lazy Snapping, the user first specifies the foreground and background seeds using a graphical user interface and then applies the k-means algorithm to analyze the color distributions of the foreground and background seeds. These distributions are then utilized to formulate an energy function and the graph cut theory is employed to minimize this energy function, thus segmenting the image. This initial segmentation may not exactly coincide with the object boundary, thus a boundary

editing process is then applied to refine the results. Since the foreground and background seeds can be selected far away the object boundary and boundary editing can be achieved by dragging the control points of the boundary to the desired positions via a well-defined user interface, Lazy Snapping could cut the object with less user labor than previous method. Moreover, by incorporating the watershed segmentation [46] and the efficient min-cut/maxflow algorithm [47], Lazy Snapping could even response to user’s operations in real time. GrabCut is similar to Lazy Snapping with an initial graph cut image segmentation followed by a border matting process. The different point is that GrabCut models the foreground and background color distributions as mixture Gaussian and applies an iterative method to minimize the energy function (which is also different from the formulation of Lazy Snapping). The major advantage of GrabCut is that it does not need to specify the foreground (or background) region and the algorithm can automatically learn the foreground and background distributions by the iterative minimization algorithm. However, we found that for our application of trunk extraction, GrabCut sometimes failed to correctly identify our foreground region (the trunk), thus some user’s efforts were still required to specify the foreground seeds. On the other hand, since it is an iterative method, it requires more computation time than Lazy Snapping. Therefore, we think that Lazy Snapping is more suitable for our application of trunk extraction. Fig. 2 illustrates an example for trunk extraction by Lazy Snapping. Although it is quite powerful for object segmentation, Lazy Snapping does not perform very well for thin and branch objects as claimed by the authors of Lazy Snapping [44]. Therefore, to have a satisfactory trunk extraction we must mark the foreground seeds along the trunk carefully as shown in Fig. 2b. This is a disadvantage of Lazy Snapping, but as long as the foreground seeds are selected carefully, the performance of trunk extraction is quite well as shown in Fig. 2d. Currently, the required processing time (including user selection of the foreground and background seeds and boundary editing) for this case is about 6 min. More complicated trees need more processing time, but it will not differ too much in contrast to this typical case. 3.3. Establishing 2D trunk structure The skeleton of extracted trunk provides a good way to explore the 3D trunk skeleton. If the depth

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

39

Fig. 2. Extracting trunk by Lazy Snapping. (a) Original image. (b) Foreground (red) and background (blue) seeds. (c) Initially extracted trunk by Lazy Snapping. (d) Extracted trunk after boundary editing. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this paper.)

information can be acquired, we can then establish the 3D trunk skeleton from these 2D skeleton points. Typically, there is no need to estimate the depths of all skeleton points, only some sample points are enough. This can simplify the establishment of the 2D trunk structure. In our system, we utilize a thinning algorithm with hidden deletable pixel detection to obtain a bias-reduced trunk skeleton [48]. We then partition the skeleton into a number of small blocks called trunk elements. The central skeleton point in each trunk element, i.e., the 2D trunk point, is selected for representing this trunk element. Accompanying with each trunk element is the trunk thickness which is computed by the average number of thinning operations to obtain the skeleton points in this trunk element. The trunk thickness plays an important role in constructing the 3D trunk model for it determines the radii of the generalized cylinders in trunk modeling.

After determining the trunk elements, the 2D trunk structure is then established. Establishing 2D trunk structure is referred to connecting the trunk elements or the 2D trunk points according to the trunk context relations so that the connected trunk elements can characterize the actual 3D trunk structure in the 2D image plane. The concept of trunk element and one example of the established 2D trunk structure are illustrated in Fig. 3. The 2D trunk structure is necessary for building the 3D trunk model since it provides a way for connecting adjacent cylinders in trunk modeling. Moreover, the context relations between trunk elements can also be utilized to constraint the 3D trunk points. This issue will be discussed in more details in Section 3.5. Generally, establishing the 2D trunk structure is not a difficult problem when the tree has a simple and distinct trunk. We can trace the trunk from

root Trunk Region

Trunk Skeleton

Trunk Element

2D Trunk Structure

Fig. 3. Illustration of trunk element and 2D trunk structure.

40

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

the root and connect the trunk elements one by one. However, it is not always this case. Sometimes there are cross-branches and partial occlusions between neighboring stems. Examples of cross-branches and partial occlusion between stems are pictured in Fig. 4. For the case of cross-branches, by considering the colors, orientations, and motions of candidate trunk elements, it is still possible to find the correct connecting path if a sophisticated tracing algorithm is applied. However, for the case of partial occlusion, the extracted skeleton is deviated from the true one so that the trunk elements associated to the incorrect skeleton have failed to represent the actual trunk structure. These trunk elements should be discarded in the establishment of the 2D trunk structure. Discarding these trunk elements will loss some details of the trunk structure, but as long as the stems are not seriously occluded, the resulting trunk structure can still represent the actual structure of the trunk. However, without recognizing the actual trunk structure of the tree, it is quite difficult to distinguish these invalid trunk elements from the correct ones. Therefore, in this work, establishing the 2D trunk structure is semi-automatic. The trunk elements are first connected by tracing the trunk from the root automatically. Typically, this simple tracing process will cause some trunk elements being false connected. These false connected trunk elements are then modified via a graphical user interface. These trunk elements are reconnected or discarded by some designed operations such as disable, enable, connect, and disconnect with the assistance of this graphical user interface. Fig. 5 shows a snapshot

of this user interface. On the right-hand side of this figure is the original image which is used for the guidance for establishing the 2D trunk structure, while on the left side is the current result of the 2D trunk structure. The user can select one trunk element and change its status via specific functional icons (disable, enable, connect, disconnect, etc.) to produce more accurate trunk structure. Typically, the incorrect connections of trunk elements often occur at the branching points, thus we need only to modify a small part of the trunk elements to have a satisfactory result. Hence, the required user labor for this stage is much less than that of trunk extraction. 3.4. Camera self-calibration Camera self-calibration plays a central role in many techniques of structure from motion. It states that: Given a set of correspondences between several views, find the intrinsic and extrinsic parameters of the undergoing camera. Typically, the projection of a 3D scene point M onto the image plane can be described by the relation: m  PM, where P denotes the projection matrix of the camera and m is the corresponding 2D image point. The symbol  indicates that this equation is equated up to a scale due to the homogeneous representation of M and m. Generally, without any restrictions on the projection matrices and the 3D scene points, there are infinite solutions satisfying this relation no matter how many views we have. This phenomenon is referred to as projective ambiguity for the solutions satisfying this relation are all related by

Fig. 4. Examples of cross-branches and partial occlusion between stems. (a) Cross-branches: The branch from right top to left bottom of the figure is located between the other two branches. However, because of image projection, the extracted skeletons (red points) are connected together and thus require a sophisticated tracing method to decompose them. (b) Partial occlusion between stems: Portion of the two stems in the central part of the figure are merged together so that the extracted skeleton (red points) is deviated from the true one (yellow points). Establishing the 2D trunk structure is more difficult in this case for the extracted skeleton is not the correct one. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this paper.)

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

41

Fig. 5. The user interface for establishing the 2D trunk structure.

a projective transformation. To calibrate the camera and obtain a metric reconstruction of the scene, some constraints on the camera should be imposed. With regard to a physical camera, the projection matrix P can be decomposed as K[Rj  Rt], where the rotation matrix R and translation vector t are the extrinsic parameters of the camera which determine the orientation and position of the camera with respect to some world coordinate system. The matrix K is the camera calibration matrix taking the following structure: 2 3 sf a u0 6 7 K ¼ 4 0 f v0 5; ð1Þ 0

0

1

where f is the focal length, s stands for camera aspect ratio, a is the skew factor accounting for nonrectangular pixels, and (u0,v0) denotes the image principal point, i.e., the intersection of optical axis with the image plane. These parameters are termed as the intrinsic or internal parameters of the camera. With this physical camera model, the problem of self-calibration turns to finding the camera projection matrix P such that it can be decomposed as K[Rj  Rt] with the special structure of K and R. Specifically, suppose we have L views and N 3D scene points. Let P(i) denote the projection matrix of the ith view, Mj be the jth 3D scene point, and ðiÞ mj represent the jth image point in the ith view.

Then, camera self-calibration can be formulated as the following optimization problem: minimize

X i;j

2 4

ðiÞ P T Mj ðiÞ mj;x  1ðiÞ P3 T Mj

!2

ðiÞ

þ

ðiÞ mj;y

  subject to PðiÞ ¼ KðiÞ RðiÞ j  RðiÞ tðiÞ ; 8i;



P2 T Mj ðiÞ

P3 T Mj

!2 3 5

ð2Þ ðiÞ ðiÞ ðmj;x ; mj;y Þ

where is the measured image coordinate ðiÞ for the jth point in the ith view and Pk denotes the kth row of P(i). This optimization problem states that we should look for K(i), R(i), t(i), and Mj such that the reprojection error, i.e., the sum of squared image distances between the measured and reprojected image points, is minimized. This formulation is similar to that of bundle adjustment [28] except the constraint P(i) = K(i)[R(i)j  R(i)t(i)] which is used for metric reconstruction and is generally omitted in bundle adjustment. The bundle adjustment has the advantages of being tolerant of missing data while providing a true ML estimate. Typically, using bundle adjustment often requires a good initialization so that it can converge to a reasonable solution. Therefore, bundle adjustment is generally be used as the final step of any reconstruction algorithm. However, bundle adjustment often becomes an extremely large minimization problem because of the large number of parameters it involved. Thus, bundle adjustment will sometimes become quite

42

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

costly. In the following, we will show that for the two-view case under some reasonable assumptions of the camera, the optimization problem in (2) can be reduced to a one-variable minimization problem and the solution can be easily obtained by a one-dimensional search algorithm. In this work, we use only two images for camera self-calibration. We assume that the aspect ratio s, skew factor a, and image principal point (u0,v0) have been known prior to the calibration. Only the focal length f is unknown. Generally, the skew factor is quite close to zero and thus can be ignored. The aspect ratio is rather close to one and can be directly set to one. The principal point often requires other calibration techniques (e.g. [27]) to extract it but we can simply assume that it is just located at the center of the image. This is not always true but, as will be seen in our experimental results, it still produces satisfactory results for 3D trunk model building. We also suppose that the unknown focal length is fixed for the two views, thus the two calibration matrices are identical under our assumptions.1 Generally, if the two images are extracted from an image sequence without zooming and refocusing, the assumption of fixed focal length is reasonable. For other situations, we must carefully control the camera during the capture of images to satisfy this condition. We align the world coordinate system with the camera coordinate system of the first view, thus the two projection matrices are reduced to P(1) = [K(f )j0] and P(2) = [K(f )Rj  K(f )Rt], where, under our assumptions, the calibration matrix K is clearly a function of the focal length f. In fact, given an arbitrary f (which may not be the true f ), the rotation matrix R and translation vector t can be estimated from the fundamental matrix (or essential matrix) and the given K. Thus, in this sense, R and t are also functions of f. In depth, the fundamental matrix F can be expressed as KTRSK1, where S is the skew symmetric matrix created from t. The essential matrix is defined as E = RS and therefore if K is given, the essential matrix can then be calculated from F. Hartley demonstrated that the rotation R and translation t can be estimated via the singular value decomposition of E [50]. That is, if E = UDVT, then

1 In fact, our calibration algorithm can be extended to the case of varying focal length [49]; however, for the sake of stability we assume the focal length is fixed in the proposed system.

S ¼ VZVT ; T

R ¼ UGV where 2

0

6 G ¼ 4 1 0

ð3Þ T

R ¼ UG V ;

or

1

T

0

3

0

7 0 5;

0

1

2

0

6 Z ¼ 41 0

ð4Þ

1 0

3

0

7 0 5:

0

0

ð5Þ

By accounting for the two possible signs of t and the two possible choices of R, there are four possible configurations for the camera. However, only one survives by testing with a single 3D point to determine whether it is located in front of the two views. After determining R and t, the 3D points Mj can then be estimated by linear triangulation or optimal triangulation method [28]. Thus, in this sense, Mj is also a function of f. Based on previous discussion, the cost function in (2) turns to the following onevariable function: 2 !2 ðiÞ T X P ðf ÞM ðf Þ j ðiÞ 1 4 mj;x  Eðf Þ ¼ ðiÞ P3 T ðf ÞMj ðf Þ i;j !2 3 ðiÞ T P ðf ÞMj ðf Þ 5 ðiÞ ; ð6Þ þ mj;y  2ðiÞ P3 T ðf ÞMj ðf Þ where P(1)( f ) = [K(f )j0] and P(2)( f ) = [K( f )R(f )j  K( f )R( f )t( f )]. To deeply understand the characteristic of E( f ), we conducted an experiment and investigated its behavior. We set up a virtual camera with focal length equal to 1000 and took two views for a family of 200 points randomly distributed on the surface of a sphere. The graph of E( f ) with respect to the focal length ranged from 200 to 1800 is pictured in Fig. 6a. In this figure, there is only one minimum corresponding to the true focal length of the virtual camera. The unimodality of E( f ) in this range allows us to apply an one-dimensional search algorithm such as golden section search or Fibonacci search to locate the minimum [51]. However, E( f ) is not always unimodal in this range of f. Sometimes, there exists another minimum at very low f, typically ranged from 0 to 400. Nevertheless, we can narrow the search range to exclude the local minimum at low f so that E( f ) remains a unimodal function at the desired interval. This is reasonable for in most cases we know the rough range of the true focal length from experience. Thus, we can restrict our search range in a more practical interval of f. For instance, in our experiments of 3D trunk

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

b

1600

4000

1400

3500

1200

3000 Reprojection Error

Reprojection Error

a

1000 800 600

2500 2000 1500

400

1000

200

500

0 200

400

600

800

1000

1200

1400

1600

43

0 200

1800

f

400

600

800

1000

1200

1400

f

Fig. 6. Camera self-calibration for the two-view case. The function E(f ) for (a) simulated data and (b) real image data.

model building, the actual focal lengths are always ranged from 400 to 1200, and therefore we can restrict our search range to this interval. The function E(f ) is always unimodal in this range of f. To examine the accuracy of proposed algorithm, we added the Gaussian noise with standard deviation ranged from 0 to 1.0 to the corresponding points of the two views. The calibration results are shown in Table 1. The results of another technique, the Bougnoux’s method, are also listed in this table for comparison. Bougnoux provided a closed-form solution for calculating the focal length for the two-view case [37]. Before discussing the results, let us emphasize an important concept in camera

self-calibration, the critical motion sequence [52]. For camera self-calibration, there exist some special camera configurations preventing the cameras from calibrating, and the sequences of these special camera motions are referred to as critical motion sequences. The problem of critical motion sequence is inherent so that it cannot be resolved by any algorithm without additional knowledge. Because of the existence of critical motion sequence, the accuracy of camera self-calibration highly depends on the camera configuration. Different camera configurations will lead to quite different accuracy of the estimated focal length. Table 1 lists the results under far away and near critical motion configurations. For

Table 1 Comparison of proposed method with Bougnoux’s closed-form solution [37] under the condition of fixed focal length Noise level

Far away critical motion

Near critical motion Our method

Bougnoux

rn = 0.0

Mean St. dev.

1000.57a (0.057%)b 0.0

1000.00 (0.0%) 0.0

1000.57 (0.057%) 0.0

1000.00 (0.0%) 0.0

rn = 0.2

Mean St. dev.

999.89 (0.011%) 13.02

999.83 (0.017%) 13.15

907.64 (9.236%) 237.92

869.76 (13.024%) 295.60

rn = 0.4

Mean St. dev.

1000.22 (0.022%) 26.04

1000.18 (0.018%) 26.37

862.60 (13.74%) 277.14

786.39 (21.36%) 326.82

rn = 0.6

Mean St. dev.

1001.04 (0.104%) 39.21

1001.06 (0.106%) 39.81

838.58 (16.142%) 285.42

739.33 (26.067%) 324.80

rn = 0.8

Mean St. dev.

1002.37 (0.237%) 52.67

1002.48 (0.248%) 53.64

838.79 (16.121%) 281.99

774.42 (22.558%) 339.22

rn = 1.0

Mean St. dev.

1004.22 (0.422%) 66.66

1004.49 (0.449%) 68.06

814.18 (18.582%) 287.47

766.96 (23.304%) 325.34

a b

Our method

Bougnoux

The true focal length is 1000. The number in parentheses is the relative error.

44

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

the far away critical motion case, both our and Bougnoux’s method can yield quite accurate focal length even for the serious disturbance case (rn = 1.0). The relative error of f produced by our method is only 0.422% with standard deviation 66.66. However, for the near critical motion case, our approach gave more more accurate focal length with narrower uncertain range than Bougnoux’s closed-form solution. To demonstrate the feasibility of proposed method on real images, the images shown in Fig. 1 were tested by our calibration module and the resulting E(f ) is pictured in Fig. 6b. This figure reveals that there is one minimum in the reasonable interval, which coincides with our expectation. The f corresponding to the minimum of E(f ) is of course our estimated focal length. We will give more results for real images in Section 4. 3.5. 3D trunk skeleton recovery Since the context relations between trunk points have been established in the module of 2D Trunk Structure Establishing, recovering the 3D trunk skeleton is equivalent to estimating the 3D position of each trunk point, i.e., computing the 3D trunk points. Typically, recovering the 3D points is not a difficult problem as long as the camera is fully calibrated and the corresponding points are correctly identified. Simple triangulation can achieve this work. However, because of the unavoidable errors in the correspondences, there is an uncertainty region in 3D point recovery. This uncertainty region will be elongated if the rays of corresponding points tend to parallel. This phenomenon is pictured in Fig. 7. To relieve this effect, the lines of sight of these two views should be quite different. Unfortunately, to facilitate the search of corresponding points, the two images are demanded to be only slightly different, which implies that the rays of corresponding points are close to parallel, and therefore the effect of reconstruction uncertainty is inevitable. To overcome this problem, we introduce a smoothness constraint in the computation of the 3D trunk points. Since the 3D trunk skeleton can be seen as a curve in 3D space, the curvature of trunk skeleton provides a good constraint to restrict the position of each 3D trunk point [53]. In other words, we can formulate an optimization problem to compute the 3D trunk points, and at the same time minimize the curvature of the recovered 3D trunk skeleton. This minimum cur-

Fig. 7. The uncertainty in corresponding points will lead to the uncertainty in reconstruction. The dashed lines indicate the uncertainty range of corresponding points and the shade region illustrates the uncertainty region of reconstruction, which will be quite long when the rays tend to parallel.

vature constraint can produce a force to pull the deviated 3D trunk points back to their correct positions. Generally, errors in correspondences are resided in both images. Since the 2D trunk points are obtained from the skeleton of the extracted trunk, thus we can reasonably assume that the 2D trunk points are exactly what we want, and hence are error-free. Errors in correspondences are only introduced by matching process and are only in the points of second image. With such an assumption ð1Þ and by the projection equation mj  ½Kj0Mj , a parameterization of the jth 3D trunk point can be obtained as follows: T

Xj ¼ ½aj zj ; bj zj ; zj  ; ð1Þ ðmj;x

zj 2 R;

ð7Þ ð1Þ ðmj;y

where aj ¼  u0 Þ=f , bj ¼  v0 Þ=f , and Xj is the inhomogeneous interpretation of Mj.2 By ð2Þ utilizing the projection equation mj  Pð2Þ Mj , two linear equations of zj are derived: ð2Þ

mj;x ¼

p11 aj zj þ p12 bj zj þ p13 zj þ p14 ; p31 aj zj þ p32 bj zj þ p33 zj þ p34

ð8Þ

2 Here, we have assumed zero skew factor and unit aspect ratio for simplicity. For nonzero skew factor and nonunit aspect ratio, a similar parameterization can also be obtained.

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56 ð2Þ

mj;y ¼

p21 aj zj þ p22 bj zj þ p23 zj þ p24 ; p31 aj zj þ p32 bj zj þ p33 zj þ p34

ð9Þ

where pij refers to the (i,j) entry of P(2). The depth of each 3D trunk point can be recovered by solving the linear equations in least-squares sense, i.e., by minimizing the following quadratic function: N i X 1 h 2 2 CðzÞ ¼ ðc z þ d Þ þ ðe z þ f Þ j j j j j j c2 þ e2j j¼1 j " # N X d 2j þ fj2 2ðcj d j þ ej fj Þ 2 ¼ zj þ zj þ 2 ; c2j þ e2j cj þ e2j j¼1 ð10Þ where z = (z1, . . . , zN) is a vector encapsulating the depth information of all trunk points, N is the number of trunk points to be reconstructed, and ð2Þ

ð2Þ

cj ¼ aj ðmj;x p31  p11 Þ þ bj ðmj;x p32  p12 Þ ð2Þ

þ mj;x p33  p13 ;

ð11Þ

dj ¼

ð2Þ mj;x p34

ð12Þ

ej ¼

ð2Þ aj ðmj;y p31

 p14 ;  p21 Þ þ

ð2Þ

fj ¼

ð2Þ bj ðmj;y p32

where B is the block size of the trunk element which is used to normalize the unit from pixel to block, ð1Þ ð1Þ ð1Þ ð1Þ ðmk;x ; mk;y Þ and ðmj;x ; mj;y Þ are the corresponding image points of Xk and Xj in the first image. Although this skj is not the true arc length between Xj and Xk, it can efficiently reduce the effect of unbalanced situation between Xk and Xj. This manner can still generate satisfactory results as will be seen in our experimental results. Combining the minimum curvature constraint with the original formulation, a new cost function is then generated as follows: " # N X d 2j þ fj2 cj d j þ ej f j 2 CðzÞ ¼ wj zj þ 2 zj þ 2 cj þ e2j cj þ e2j j¼1  2 X 1  1   þ s ðXk  Xj Þ  s ðXj  Xi Þ kj ji fi;j;kg2S " # 2 2 N X d þ f c d þ e f j j j j j j ¼ wj z2j þ 2 zj þ 2 cj þ e2j cj þ e2j j¼1 X h þ aijk z2k þ bijk z2j þ cijk z2i þ dijk zj zk fi;j;kg2S

i þkijk zi zj þ gijk zk zj ;

 p22 Þ

þ mj;y p33  p23 ;

ð13Þ

ð17Þ

ð2Þ mj;y p34

ð14Þ

where S indicates the set of all triple and successive trunk points on the trunk skeleton, and 1 ð18Þ aijk ¼ 2 ða2k þ b2k þ 1Þ; skj  2 1 1 þ ða2j þ b2j þ 1Þ; ð19Þ bijk ¼ skj sji

 p24 :

Eq. (10) does not include the minimum curvature constraint. The curvature of the trunk skeleton can be approximated by   1   ðXk  Xj Þ  1 ðXj  Xi Þ; ð15Þ s  s kj

45

ji

where Xi, Xj, and Xk indicate three successive trunk points on the skeleton with the order i ! j ! k. The relations of Xi, Xj, and Xk are obtained from the established 2D trunk structure. The symbol skj denotes the arc length between Xj and Xk. Typically, if the 3D trunk points are equally spaced, the arc length can be set to 1, but unfortunately for the real cases they are not equally spaced. Recall that when cross-branches or partial occlusions between stems were occurred (see Fig. 4), the connected trunk elements will cross a rather large distance to ensure the correct establishment of the 2D trunk structure. In this case, the arc length skj and sji will be severely unbalanced. To compensate this effect, skj is selected according to the projected image distance of Xk and Xj, i.e., qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 ð1Þ ð1Þ 2 ð1Þ ð1Þ 2 ðmk;x  mj;x Þ þ ðmk;y  mj;y Þ ; skj ¼ ð16Þ B

1 2 ða þ b2i þ 1Þ; s2ji i   2 1 1 ¼ þ ðak aj þ bk bj þ 1Þ; skj skj sji   2 1 1 ¼ þ ðai aj þ bi bj þ 1Þ; sji skj sji

cijk ¼

ð20Þ

dijk

ð21Þ

kijk

gijk ¼

2 ðak ai þ bk bi þ 1Þ: skj sji

ð22Þ ð23Þ

These formulas are derived by substituting (7) into the minimum curvature constraint and expanding it. The weight wj is a parameter controlling the relative significance between the data and minimum curvature constraints at the jth point. Since not all corresponding points are correctly matched, some data constraints are unreliable. These unreliable data constraints are suppressed by controlling the value of wj. As usual, curvature provides a good

46

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

way to detect the reliability of each data constraint. However, unlike previous case, we utilize the curvature without minimum curvature constraint to determine wj, i.e., we first recover the 3D trunk points using the linear triangulation method, and then estimate the curvature of each trunk point from the recovered 3D trunk points. Since no minimum curvature constraint is imposed, the recovered 3D trunk points from those severely mismatched points will greatly deviate from the true trunk skeleton, and hence will induce large curvatures at these points. Based on this consideration, the weight wj is given by wj ¼ c expfj2j =r2 g;

views for this helix and added the Gaussian noise with standard deviation equal to 1 to the corresponding points in the second view. Fig. 8b depicts the results using the linear triangulation method, where the reconstructed helix was somewhat destroyed due to the uncertainty of reconstruction. The root-mean-squared error of the reconstructed 3D points is 1.99 for this case. Fig. 8c illustrates the results produced by the method where the minimum curvature constraint is imposed. For this simulated data, the weight wj in (17) is simply a constant for all j and the arc length skj is set to 1 for all k and j. From this figure one can see that the helix is accurately reconstructed, and the rootmean-squared error of the reconstructed 3D points is only 0.41, which is much better than the linear triangulation method. One example of the recovered 3D trunk points for a real case is displayed in Fig. 9. This figure illustrates that the minimum curvature constraint can efficiently alleviate the effect of reconstruction uncertainty and produces a visually acceptable 3D trunk skeleton.

ð24Þ

where c is a constant controlling the overall scale, jj is the estimated curvature at the jth point, and r is the scale parameter, which is selected as the median of jj. This weight can efficiently suppress unreliable data constraints and yield satisfactory results in 3D trunk skeleton recovery. The cost function given by (17) is a quadratic and convex function of zj, and by elementary algebra computation, this function can be rewritten in a matrix-vector form as C(z) = zTAz  2zTb + c. Therefore, minimizing this quadratic and convex function is equivalent to solving a linear system Az = b. In other words, the depths of all trunk points can be easily obtained by the equation z = A1b. One example of 3D recovery by imposing the minimum curvature constraint into the computation of 3D points is shown in Fig. 8. Fig. 8a shows a helix in the three-dimensional space. We took two a

3.6. Building 3D trunk model In this section, the method for building the 3D trunk models is introduced. Since we have recovered the 3D trunk skeleton, we can model the trunk using a set of generalized cylinders as long as the trunk radius at each 3D trunk point is determined. Because we have calibrated the camera, the trunk radius can be easily calculated by similar triangles as follows: c

b

20

20

15

15

15

10

10

10

5

5

5

0

0

0

–5

–5

–5

–10

–10

–10

–15

–15

–15

–20 110

–20 110

–20 110

105

10 100

0

95 90

–10

20

10

105 100

0

95 90

–10

105 5

100

10

0

95 90

–5

Fig. 8. An example of 3D recovery of a helix with and without minimum curvature constraint. (a) True helix. (b) Reconstructed helix without minimum curvature constraint. (c) Reconstructed helix with minimum curvature constraint.

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

a

47

b

Fig. 9. Recovered 3D trunk points for the test images shown in Fig. 1. (a) 3D reconstruction using linear triangulation method. (b) 3D reconstruction in conjunction with the minimum curvature constraint.

ti r i ¼ zi ; f

ð25Þ

where ri denotes the trunk radius at point Xi, ti is trunk thickness at the ith trunk point in the image plane, which is defined as the average number of thinning operations to obtain the skeleton points in the trunk element, zi is the depth of Xi, and f is the estimated focal length. As illustrated in Fig. 10, the generalized cylinder can be constructed by determining the vertices around two successive 3D trunk points Xi and Xj, and connect those adjacent vertices to form the surface of the trunk. To facilitate the calculation of these vertices, we require a frame field at each 3D trunk point. Let {Ti,Ni,Bi} denote the frame field at Xi as shown in Fig. 10. A base point Xb is selected as the basis for computing all the vertices on the generalized cylinders. The base point is generally the 3D trunk point at the

Tj rj Nj

Xj Bj

Ti

ri Ni

Xi

Bi

Fig. 10. Creating generalized cylinder by means of frame fields at the recovered 3D trunk points.

root of the tree, and the frame field associated with it is the natural frame field. The vertices around the base point, which are denoted by pb,k, are determined by sampling the points on the circle located on the NB-plane at Xb with the radius rb calculated by (25). Then, by utilizing the frame field at Xi, the vertices pi,k around the 3D trunk point Xi can be computed by (see [54]) pi;k ¼ Hi pb;k ;

ð26Þ

where the transformation Hi is defined as 2 3 si N i;1 si Bi;1 si T i;1 X i;1 6s N 7 6 i i;2 si Bi;2 si T i;2 X i;2 7 Hi ¼ 6 7: 4 si N i;3 si Bi;3 si T i;3 X i;3 5 0

0

0

ð27Þ

1

The scale si is a parameter controlling the trunk radius at Xi and is calculated by ri/rb. The numbers Ni,j, Bi,j, Ti,j, and Xi,j denote the jth component of the vectors Ni, Bi, Ti, and Xi, respectively. So far, we have not described how to select the frame field at each 3D trunk point. In general, the Frenet frame field is a choice [1]. However, deriving Frenet frame field requires computing the second-order derivative of the associated curve, which will sometimes vanish for some special cases such as the straight lines. Hence, another frame field is adopted for constructing the 3D trunk models. Suppose we are now considering determining the frame field at Xj. Let Xi be the previous point of Xj with the associated frame field {Ti, Ni, Bi}. Intuitively, Tj can be selected as the vector tangent to the 3D trunk skeleton, and Nj and Bj can be chosen arbitrary as long as they are perpendicular to Tj. However, arbitrary

48

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

selection of Nj and Bj will lead to undesired twisting of the resulting cylinder. Thus, to setup a frame field while avoiding twisting, we develop the following formulas to determine the frame field at Xj,    ð28Þ Tj ¼ Xj  Xi =Xj  Xi ; Nj ¼ Tj  Bi ;

ð29Þ

Bj ¼ Nj  Tj ;

ð30Þ

where the symbol · indicates cross-product. Note that in (29) the previous B is used to determine the new N, thus the phenomenon of twisting can be naturally avoided. With the formulas given by (28)–(30), the frame fields for all 3D trunk points can be determined by tracing the recovered 3D trunk skeleton from the root. Meanwhile, the vertices on each generalized cylinder are also computed by (26). By connecting corresponding vertices to form the edges of each polygon, a polygonal mesh representing the trunk can then be constructed. The created polygonal mesh has definitely some superimpositions at the branching points of the trunk; however, they have little visual effect on the final rendering of the trunk, and a refinement process can be applied to solve this problem [55]. One example of the created polygonal mesh is illustrated in Fig. 11a. Since we have calibrated the camera, trunk texture can be easily generated by projecting all the vertices of the polygonal mesh onto the image plane and interpolating the colors

between them. Texture mapping can produce quite realistic 3D trunk model as pictured in Fig. 11b though only the simple generalized cylinders are used to model the trunk. 4. Experiments and discussion In this section, the detailed setup of our experiments and more experimental results are given. The experiments for camera self-calibration require more explanations on their detailed procedures, thus are given first. Following this, more experimental results for 3D trunk skeleton recovery are shown. We constructed several 3D trunk models and rendered them from several different viewpoints to visualize the resulting 3D trunks. Finally, this section is ended with a discussion about the restrictions of proposed system. 4.1. Camera self-calibration The test images used in our experiments were all captured by a digital camera with the image size 640 · 480 pixels. Because we require the camera focal length remaining unchanged in our algorithm, our test images were all captured under fixed camera settings, i.e., there were no refocusing and zooming between the two views. Before calibrating the camera, image correspondences should be established first. Since the two images were assumed only

Fig. 11. 3D trunk model created by the test images shown in Fig. 1. (a) Polygonal mesh of the trunk. (b) Textured 3D trunk model.

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

slightly different, image correspondences can be easily searched by matching a local region between the two images. In addition to this block matching approach, optical flow provides another way for locating image correspondences. Because of the rapid development of optical flow techniques in the past decades, today’s optical flow estimation algorithms can produce very accurate image motions. Therefore, it is convinced that optical flow can yield more accurate image correspondences except for some special situations. In the proposed system, we exploited a gradient-based regularization method [56] to estimate the optical flow, hence identified the corresponding points. In our camera selfcalibration algorithm, we do not need every point in the image to be processed. Only some sparse points are sufficient. Since corner points provide more robust image feature for locating correspondences, only these corners were employed to calibrate the camera. Harris’s corner detector [57] was used to locate the corners in our experiments. After

a

49

establishing image correspondences, the fundamental matrix was then computed. In our experiments, we employed a robust technique, the least-mediansquares (LMedS), to estimate the fundamental matrix. Some mismatched corresponding points, i.e., the outliers, can also be detected by LMedS, and these outliers were removed in our calibration method to achieve more accurate camera calibration. In our experiments, we assumed zero skew factor and unit aspect ratio for the camera. Image principal point was supposed at the center of the image. These assumptions are only approximations, not the true values; however, as will be seen in our experimental results, these assumptions are reasonable and satisfactory 3D trunk models can still be generated. We depict the graphs of E(f ) for several real cases in Fig. 12. This figure reveals that sometimes there are two local minimums in this range of f. However, one of the local minimum is located at low f, and hence can be excluded by narrowing

b

4

7000

2

x 10

1.8

6000

1.6 1.4

Reprojection Error

Reprojection Error

5000

4000

3000

2000

1.2 1 0.8 0.6

1000

0.4 0.2

0 200

400

600

800

1000

1200

1400

0 200

f

c 4.5

x 10

400

600

800

1000

f

1200

1400

d

4

4

3

x 10

4 2.5

3.5

Reprojection Error

Reprojection Error

3 2.5

2

1.5

2 1.5

1

1 0.5 0.5 0 200

400

600

800

1000 f

1200

1400

0 200

Fig. 12. The graphs of E(f ) for several real cases.

400

600

800

f

1000

1200

1400

50

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

curve to acquire more accurate 3D trunk points. Before computing the 3D trunk points, the corresponding trunk points in the second image should be identified first. Typically, optical flow can provide quite accurate image correspondences except for those positions where the motion discontinuities occurred. Unfortunately, the motion of a trunk is usually different from its background, thus it will always yield motion discontinuities at the edge of the trunk. Therefore, using optical flow to locate trunk correspondences is sometimes inaccurate especially for small stems or branches. In our implementation, we employed the block matching approach to search the trunk correspondences. Since the fundamental matrix has been computed in previous module, the searching range of correspondences can be restricted to a line called epipolar line. Generally, we cannot ensure that all correspondences are correctly matched, and some outliers will be introduced. The problem of outliers is overcome by the smoothness or minimum curvature constraint. In our experiments, the parameter c in (24) was set to 0.005 for all test images. One example of the reconstructed 3D trunk points for a real case has been illustrated in Fig. 9. In this section, more examples are given. The recovered 3D trunk points for the four pairs of images in Fig. 12 are depicted in Fig. 13. For each subfigure in Fig. 13, the left plot shows the results of linear triangulation method and the right plot depicts the results with the minimum curvature constraint. This figure illustrates

the searching range of f. In our experiments, the Fibonacci search method [51] was utilized to locate the minimum and the searching interval of f was (400,1200). The Fibonacci search method can yield the narrowest uncertainty range for the extreme value with the fewest function evaluations. For instance, in our experiments, only 15 function evaluations were sufficient to reduce the uncertainty range of f to 1 pixel. The estimated focal lengths for the four pairs of test images shown in Fig. 12 are 875, 651, 951, and 866 pixels, respectively. The processing time of our camera self-calibration algorithm depends on the number of points used for calibration and the numerical method in reconstructing the 3D points. For the four pairs of images shown in Fig. 12, the number of points used for calibration was ranged from 800 to 1000 and the processing time was about 1.5–2.3 s running on a PC with Pentium IV 2.4 GHz CPU and 768 MB RAM. In fact, the overall processing time of our system is dominated by the interactive image cut tool for trunk extraction, which often requires several minutes to achieve an acceptable result. Thus, our camera self-calibration is quite efficient compared with other modules of proposed system. 4.2. 3D trunk skeleton recovery and 3D trunk model building In the proposed system, we utilize a smoothness constraint derived from the curvature of a space

a

b

c

d

Fig. 13. Recovered 3D trunk points with and without smoothness constraint.

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

that the minimum curvature constraint can efficiently amend those scattered points and produce more accurate 3D trunk points. To deeply visualize the created 3D trunk models, we constructed a simple virtual environment to place the textured 3D trunk models and rendered them from several different viewpoints. Fig. 14 shows several views of a real tree and the created 3D trunk model under the corresponding viewpoints. In this example, the 3D trunk model is created based on the leftest image while the other images are used only for inspection. From this figure we can see that except some minor details and the different lighting conditions, the created 3D trunk model is quite similar to the actual tree. Since our texture mapping is accomplished by projecting each vertex in the polygonal mesh onto the image plane and interpolating the intermediate pixels using the obtained colors, we also have a texture on the back of the trunk as revealed in this figure. This is incorrect but it can produce quite well visual results. To get correct texture at the back of the trunk, more views of the trunk are definitely necessary. In this case, a fusing algorithm should be applied to combine the results from different viewpoints, which is beyond the scope of this paper.

51

In addition to this example, more 3D trunk models were also constructed as shown in Fig. 15. The first column of Fig. 15 displays the original images, and the second column shows the front views of the created 3D trunk models. These plots reveal that our 3D trunk models are quite similar to the actual trees in the images. To further explore the created 3D trunk models, the back and side views of these trunks are depicted in the third and fourth columns of Fig. 15. By inspecting the real trees, we found that the created 3D trunk models can faithfully reflect the major 3D trunk structure of the trees. Our system is also applicable to those trees with complex branches as demonstrated by the second and fourth trees in Fig. 15. In fact, as long as the trunk is not severely occluded and the trunk relations are correctly established, our system can yield rather satisfactory results. 4.3. Discussion In general, our system can perform well for most situations except for some special cases. Recall that there are critical motion sequences that prevent the camera from calibrating. Indeed, it is the critical motion sequence that produce the first restriction

Fig. 14. A real tree from different viewpoints and the corresponding views of the 3D trunk model.

52

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

Fig. 15. Examples of created 3D trunk models. The first column is the original image and the second, third, and fourth columns show the front, back, and side views of the created 3D trunk models.

of our system. Our camera cannot be arbitrarily configured. As for the two-view case in our application, two critical configurations should be emphasized. If the camera is in pure translation or pure

translation combined with arbitrary rotation about the optical axis, it is in critical motion [52]. We should avoid these two camera configurations in our applications. Typically, if the camera has great

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

Fig 15. (continued )

53

54

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

translation and rotation, critical motion is not easy to occur. However, since we demand our input images to be slightly different so that the corresponding points can be easily located, our camera should not be moved too much. Generally, a small translation together with a slightly rotation is sufficient to produce the images that satisfy our requirements. The images are only slightly different and the camera is not critically configured. All the test images in our experiments were captured in this way and the results were satisfactory. Another restriction of proposed method is occlusion. There are two types of occlusions, the occlusions induced by trunk and introduced by leaves. The mutually occluded trunks will prevent the correct 3D trunk model from being constructed and the occlusions introduced by leaves will generate incorrect trunk texture and complicate the search of correspondences. The problem of mutually occluded trunks has been addressed in Section 3.3 where it will complicate the establishment of the 2D trunk structure. Generally, if the trunk is not seriously occluded, a reasonable 3D trunk model can still be generated. However, if a trunk is fully occluded by another trunk, its 3D model cannot be recovered obviously. In this case, the occluded trunk will be omitted in the final model. The problem of occlusion is inherent since the information for the occluded trunk is absent. To solve this problem, more views are required and a fusing algorithm should be provided to combine the obtained results. At present, our system cannot deal with the problem of occlusion, thus we should carefully adjust the camera poses during the capture of images so that the occlusions are occurred as few as possible.

correspondences can be easily identified by means of block matching or optical flow estimation. We developed a camera self-calibration algorithm suitable for the two-view case to estimate camera parameters automatically. A minimum curvature constraint is formulated into the recovery of 3D trunk skeleton to alleviate the effects of reconstruction uncertainty and mismatched corresponding points. We employ a set of generalized cylinders to model the trunk. Since the camera is calibrated, texture mapping can be easily achieved. With the trunk texture, a quite realistic 3D trunk model similar to the real one is then generated. Because generalized cylinder is a general primitive for plant modeling in computer graphics, our approach is compatible with other graphical methods for further developments such as the interaction with the environment. Currently, our method is only applicable to those trees with distinct trunk and branches. Shrubs and bushes are unsuitable for the proposed system since their stems are generally severely occluded by other branches or leaves. However, these types of plants can be modeled by the graphical methods mentioned in Section 2. Combining our approach with these graphical methods, we believe that a rich natural environment can be constructed. The created virtual environment is more realism since some trees are generated from the images of the trees. At present, our system can only produce realistic 3D trunk models, no leaves are generated. Our future work is to generate the leaves so that the created 3D tree models have a more similar global appearance to the real trees.

5. Conclusion and future work

This work was supported in part by the National Science Council, Taiwan, Republic of China, under Grant Nos. NSC 93-2213-E-007-013.

In this paper we have presented a systematic system for creating a 3D trunk model from two images so that the created trunk resembles the actual tree. The proposed system is semi-automatic, i.e., some manual adjustments are still required. However, by utilizing an interactive segmentation tool with a well-defined user interface, the task becomes so simple such that a general user without any experience can still work well. The user should capture two images of the tree by using a camera undergoing a small translation together with slightly different orientations to ensure the generation of slightly different images while avoiding critical configurations. Since the images are slightly different, image

Acknowledgments

References [1] J. Bloomenthal, Modeling the mighty maple, in: Proceedings of SIGGRAPH 85, 1985, pp. 305–311. [2] P. de Reffye, C. Edelin, J. Franc¸on, M. Jaeger, C. Puech, Plant model faithful to botanical structure and development, Computer Graphics 22 (4) (1988) 151–158. [3] B. Lintermann, O. Deussen, Interactive modeling of plants, IEEE Computer Graphics and Applications 19 (1) (1999) 56–65. [4] J.L. Power, A.J.B. Brush, P. Prusinkiewicz, D.H. Salesin, Interactive arrangement of botanical L-system models, in: Proceedings of the 1999 Symposium on Interactive 3D Graphics, 1999, pp. 175–182, 234.

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56 [5] K. Onishi, S. Hasuike, Y. Kitamura, F. Kishino, Interactive modeling of trees by using growth simulation, in: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, 2003. [6] I. Shlyakhter, M. Rozenoer, J. Dorsey, S. Teller, Reconstructing 3D tree models from instrumented photographs, IEEE Computer Graphics and Applications 21 (3) (2001) 53–61. [7] A. Lindenmayer, Mathematical models for cellular interaction in development, Journal of Theoretical Biology 18 (1968) 280–315. [8] A.R. Smith, Plants, fractals, formal languages, Computer Graphics 18 (3) (1984) 1–10. [9] P. Prusinkiewicz, A. Lindenmayer, J. Hanan, Developmental models of herbaceous plants for computer imagery purposes, Computer Graphics 22 (4) (1988) 141–150. [10] P. Prusinkiewicz, A. Lindenmayer, The Algorithmic Beauty of Plants, Springer-Verlag, New York, 1990. [11] P. Prusinkiewicz, M. James, R. Meˇch, Synthetic topiary, in: Proceedings of SIGGRAPH 94, 1994, pp. 351–358. [12] R. Meˇch, P. Prusinkiewicz, Visual models of plants interacting with their environment, in: Proceedings of SIGGRAPH 96, 1996, pp. 397–410. [13] W.T. Reeves, Approximate and probabilistic algorithm for shading and rendering structured particle systems, in: Proceedings of SIGGRAPH 85, 1985, pp. 313–322. [14] P.E. Oppenheimer, Real time design and animation of fractal plants and trees, in: Proceedings of SIGGRAPH 86, 1986, pp. 55–64. [15] J. Weber, J. Penn, Creation and rendering of realistic trees, in: Proceedings of SIGGRAPH 95, 1995, pp. 119–128. [16] P. Prusinkiewicz, M.S. Hammel, E. Mjolsness, Animation of plant development, in: Proceedings of SIGGRAPH 93, 1993, pp. 351–360. [17] T. Sakaguchi, J. Ohya, Modeling and animation of botanical trees for interactive virtual environment, in: Proceedings of the ACM Symposium on Virtual Reality Software and Technology, 1999, pp. 139–146. [18] J.C. Wong, A. Datta, Animating real-time realistic movements in small plants, in: Proceedings of 2nd International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, 2004, pp. 182– 189. [19] S. Lefebvre, F. Neyret, Synthesizing bark, in: Proceedings of 13th Eurographics Workshop on Rendering, 2002, pp. 105– 116 and 323. [20] J. Lluch, E. Camahort, R. Vivo´, Procedural multiresolution for plant and tree rendering, in: Proceedings of 2nd International Conference on Computer Graphics, Virtual Reality, Visualisation and Interaction in Africa, 2003, pp. 31–37. [21] O. Deussen, P. Hanrahan, B. Lintermann, R. Meˇch, M. Pharr, P. Prusinkiewicz, Realistic modeling and rendering of plant ecosystems, in: Proceedings of SIGGRAPH 98, 1998, pp. 275–286. [22] P. Prusinkiewicz, Simulation modeling of plants and plant ecosystem, Communications of the ACM 43 (7) (2000) 84–93. [23] C. Slama, Manual of Photogrammetry, fourth ed., American Society of Photogrammetry, Falls Church, VA, USA, 1980. [24] R.K. Lenz, R.Y. Tsai, Techniques for calibration of the scale factor and image center for high accuracy 3-D machine

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

55

vision, IEEE Transactions on Patten Analysis and Machine Intelligence 10 (5) (1988) 713–720. B. Caprile, V. Torre, Using vanishing points for camera calibration, International Journal of Computer Vision (1990) 127–140. P. Beardsley, D. Murray, A. Zisserman, Camera calibration using multiple images, in: Computer Vision—ECCV’92, Lecture Notes in Computer Science, vol. 588, SpringerVerlag, Berlin, 1992, pp. 312–320. Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Patten Analysis and Machine Intelligence 22 (11) (2000) 1330–1334. R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, second ed., Cambridge University Press, Cambridge, MA, 2004, ISBN: 0521540518. Z. Zhang, Determining the epipolar geometry and its uncertainty: a review, International Journal of Computer Vision 27 (2) (1998) 161–195. X. Armangue´, J. Salvi, Overall view regarding fundamental matrix estimation, Image and Vision Computing 21 (2003) 205–220. O.D. Faugeras, Q. Luong, S.J. Maybank, Camera selfcalibration: theory and experiments, in: Computer Vision— ECCV’92Lecture Notes in Computer Science, vol. 588, Springer-Verlag, Berlin, 1992, pp. 321–334. S.J. Maybank, O.D. Faugeras, A theory of self-calibration of a moving camera, International Journal of Computer Vision 8 (2) (1992) 123–151. R.I. Hartley, Euclidean reconstruction from uncalibrated views, in: Applications of Invariance in Computer VisionLecture Notes in Computer Science, vol. 825, SpringerVerlag, Berlin, 1994, pp. 237–256. M. Armstrong, A. Zisserman, R. Hartley, Self-calibration from image triplets, in: Computer Vision—ECCV’96, Lecture Notes in Computer Science, vol. 1064, SpringerVerlag, Berlin, 1996, pp. 3–16. B. Triggs, Autocalibration and the absolute quadric, in: Proceedings of International Conference on Computer Vision and Pattern Recognition, 1997, pp. 609–614. Q.-T. Luong, O.D. Faugeras, Self-calibration of a moving camera from point correspondences and fundamental matrices, International Journal of Computer Vision 22 (3) (1997) 261–289. S. Bougnoux, From projective to euclidean space under any practical situation, a criticism of self-calibration, in: Proceedings of 6th International Conference on Computer Vision, 1998, pp. 790–796. M. Pollefeys, L.V. Gool, A stratified approach to metric self-calibration, in: Proceedings of International Conference on Computer Vision and Pattern Recognition, 1997, pp. 407–412. M. Pollefeys, Self-calibration and metric 3D reconstruction from uncalibrated image sequences, Ph.D. thesis, Katholieke Universiteit Leuven, 1999. M. Pollefeys, L.V. Gool, Stratified self-calibration with the modulus constraint, IEEE Transactions on Patten Analysis and Machine Intelligence 21 (8) (1999) 707–724. M. Pollefeys, Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters, International Journal of Computer Vision 32 (1) (1999) 7–25. G.J. McLachlan, T. Krishnan, The EM Algorithm and Extensions, John Wiley and Sons, 1997.

56

C.-H. Teng et al. / Graphical Models 69 (2007) 33–56

[43] S. Belongie, C. Carson, H. Greenspan, J. Malik, Color- and textured-based image segmentation using EM and its application to content-based image retrieval, in: Proceedings of 6th International Conference on Computer Vision, 1998, pp. 675–682. [44] Y. Li, J. Sun, C.-K. Tang, H.-Y. Shum, Lazy snapping, ACM Transactions on Graphics 23 (3) (2004) 303–308 (Special Issue: Proceedings of the 2004 SIGGRAPH Conference). [45] C. Rother, V. Kolmogorov, A. Blake, Grabcut: interactive foreground extraction using iterated graph cuts, ACM Transactions on Graphics 23 (3) (2004) 309–314 (Special Issue: Proceedings of the 2004 SIGGRAPH Conference). [46] L. Vincent, P. Soille, Watersheds in digital spaces: an efficient algorithm based on immersion simulations, IEEE Transactions on Patten Analysis and Machine Intelligence 13 (6) (1991) 583–598. [47] Y. Boykov, V. Kolmogorov, An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision, IEEE Transactions on Patten Analysis and Machine Intelligence 26 (9) (2004) 1124–1137. [48] Y.S. Chen, Hidden deletable pixel detection using vector analysis in parallel thinning to obtain bias-reduced skeletons, Computer Vision and Image Understanding 71 (3) (1998) 294–311.

[49] C. Teng, Y. Chen, W. Hsu, Camera self-calibration method suitable for variant camera constraints, Applied Optics 45 (4) (2006) 688–696. [50] R.I. Hartley, Estimation of relative camera positions for uncalibrated cameras, in: Computer Vision—ECCV’92, Lecture Notes in Computer Science, vol. 588, SpringerVerlag, Berlin, 1992, pp. 579–587. _ [51] E.K.P. Chong, S. Zak, An Introduction to Optimization, second ed., John Wiley and Sons, 2001. [52] P. Sturm, Critical motion sequences for the self-calibration of cameras and stereo systems with variable focal length, Image and Vision Computing 20 (2002) 415–426. [53] F. Kahl, J. August, Multiview reconstruction of space curves, in: Proceedings of International Conference on Computer Vision, vol. 2, 2003, pp. 1017–1024. [54] F.S. Hill, Computer Graphics Using OpenGL, second ed., Prentice-Hall, Englewood Cliffs, NJ, 2001. [55] J. Lluch, R. Vivo´, C. Monserrat, Modeling tree structures using a single polygonal mesh, Graphical Models 66 (2004) 89–101. [56] C.-H. Teng, S. Lai, Y. Chen, W. Hsu, Accurate optical flow computation under non-uniform brightness variations, Computer Vision and Image Understanding 97 (2005) 315–346. [57] C. Harris, M. Stephens, A combined corner and edge detector, in: Proceedings of Fourth Alvey Vision Conference, 1988, pp. 147–152.