An optimized in vivo multiple-baseline stereo imaging system for skin wrinkles

An optimized in vivo multiple-baseline stereo imaging system for skin wrinkles

Optics Communications 283 (2010) 4840–4845 Contents lists available at ScienceDirect Optics Communications j o u r n a l h o m e p a g e : w w w. e ...

2MB Sizes 0 Downloads 4 Views

Optics Communications 283 (2010) 4840–4845

Contents lists available at ScienceDirect

Optics Communications j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / o p t c o m

An optimized in vivo multiple-baseline stereo imaging system for skin wrinkles Onseok Lee a,b, Gunwoo Lee a,b, Jangseok Oh c, Mingi Kim c,⁎, Chilhwan Oh a,b,d,⁎ a

Biomedical Engineering, Biomedical Science of Brain Korea 21, Korea University College of Medicine, Anam-dong, Seongbuk-gu, Seoul, 136-705, Republic of Korea Research Institute for Skin Image, Korea University Medical Center, Guro2-dong, Guro-gu, Seoul, 152-703, Republic of Korea c 3D Information Processing Laboratory, Korea University, Anam-dong, Seongbuk-gu, Seoul, 136-705, Republic of Korea d Department of Dermatology, Korea University Guro Hospital, Korea University College of Medicine, Guro2-dong, Guro-gu, Seoul, 152-703, Republic of Korea b

a r t i c l e

i n f o

Article history: Received 18 February 2010 Received in revised form 1 July 2010 Accepted 1 July 2010 Keywords: Stereo image Baseline Skin Disparity Graph cut

a b s t r a c t Recent developments in computer vision have attempted to mimic human visual physiology. One application of this technology is the evaluation of skin wrinkles. We have developed a non-convergence stereo system that can be calibrated in vivo and can be controlled to the level of microns for baseline. We are able to obtain more accurate 3D information by calibrating nonlinear interrelations between the disparity of object and depth information. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Stereo vision can be used to obtain three-dimensional (3D) information (disparity), which represents the differences in position between two points projected in 3D space [1,2]. The disparity d is related to the depth z by d = BF

1 z

ð1Þ

where B and F are the baseline and focal lengths, respectively. In other words, we can obtain better precision in objects with a limited range of depth information because it can be expressed as larger disparity with the baseline, the distance between the optical axes of two cameras. However, the larger the baseline, the larger the stereo matching line, which can degrade correctness, leading to a tradeoff between precision and correctness [3]. Additionally, the above formula shows that depth information, expressed as a disparity, is a nonlinear relation. Thus, to obtain correct and quantitative 3D information using stereo images, both the correct choice of baseline and the calibration of the disparity are important [4]. Stereo camera systems are necessary to obtain 3D information from stereo images. These can be divided into non-convergence and ⁎ Corresponding authors. Oh is to be contacted at Department of Dermatology, Korea University Guro Hospital, Korea University, College of Medicine, 97 Guro-dong gil, Guro-dong, Guro-gu, Seoul, 152-703, Republic of Korea. Tel.: +82 2 2626 1894; fax: +82 2 2626 1900. Kim, 3D Information Processing Lab., Korea University, 126-1 Anam-dong 5ga, Seongbuk-gu, Seoul, 136-701, Republic of Korea. Tel./fax: +82 2 3290 3977. E-mail addresses: [email protected] (M. Kim), [email protected] (C. Oh). 0030-4018/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.optcom.2010.07.005

convergence camera models [1,2,5]. Non-convergence cameras obtain stereo images without vertical range by arranging the optical axes of two parallel cameras [6,7]. As it is difficult to satisfy this condition, convergence camera models with vertical range are normally used. These cases require a complicated stereo matching procedure, and rectification is necessary to remove vertical range [1,2]. This has the effect of improving calculation speed and correctness by changing 2D problems with vertical and horizontal aspects to 1D problems by removing the vertical aspect. Recently, much research has attempted to replicate the human visual system via computer [8]. This research has many applications, including evaluating skin diseases in the medical field [5,9–11]. However, existing systems cannot select suitable baseline images for calibration because the baseline is fixed. These techniques use a convergence camera model to create stereo images by using a mirror connected stereo system with multiply angles so that a weak impact can twist the reflection angle [12]. The technology allows the camera to shift along the X, Y, and Z axes delicately, allowing it to control the baseline and obtain non-convergence stereo images. However, it cannot obtain right and left images simultaneously, so it is improper for medical applications that need to obtain in vivo stereo images. Obtaining images by replicating living skin using silicon can remove the right/left image change in the region of interest (ROI) caused by breathing and other movements, but this brings about errors in 3D information caused by minute bubbles in the silicon that are created by manufacturing replicas. Accordingly, obtaining more correct and quantitative 3D information using stereo images in the medical field requires: (1) a system available for minute baseline control at high powered images to

O. Lee et al. / Optics Communications 283 (2010) 4840–4845

obtain a suitable baseline for a stereo image of living skin; (2) a system to obtain non-convergence stereo images with correct fast matching that does not require a rectification process; and (3) an in vivo system to obtain right and left images simultaneously without rectification. We constructed a system that satisfies all of the above conditions and deduces correct 3D information by calibrating the nonlinear relationship between disparity and depth information. 2. Materials and methods Fig. 1 is a schematic of the main body required to realize a system appropriate for the purposes of this study. The piece used to obtain the right and left image was made independently, and the zoom lens (low magnification Zoom 7000, NAVITA®, Japan) used to control magnification and the charge-coupled device (CCD) camera (GEV GP-3780C, GEViCAM Inc., USA) used to obtain images were made according to common specifications. The part to obtain the left images (hereafter known as the shifting part) is attached to a manual stage (Parker Hannifin-Daedal Division, USA) and was designed to shift from a minimum of 5.08 μm (0.0002 inch) to a maximum of tens of millimeters from side to side. This allows operators to obtain baseline values to obtain correct 3D information depending on the type of skin being visualized. The piece to obtain right images (hereafter known as the fixed part) was designed to not shift. The shifting and fixed parts are arranged vertically to avoid a constitutional conflict which would prevent obtaining two images simultaneously, and beam splitter (Green Optics Co., Ltd., Korea) is installed at the crossing point of each optical line to detect the images. The merits of this design are that it improves the correctness and speed of stereo matching because one can obtain non-convergence stereo images. Additionally, in vivo images can also be obtained since you can capture stereo images simultaneously. This main body of the apparatus can obtain in vivo images of living skin as it selects points for calibration using a shiftable stand (Articulating Arm Boom Stand, EO® Edmund Optics Inc., USA) in 3D space. The images captured by the CCD camera are sent to a computer using a Dual Port Gigabit Ethernet Server NIC (INTEL® PRO/1000, USA)

4841

for speed. The main body uses a light emitting diode (LED) ring light and an LED power supply (SEOKWANG CO., LTD., Korea) to allow for minute focusing control. Additionally, we included a diffusion filter (Hongseong optics CO., LTD., Korea) to distribute the light evenly. In this system, operators should not expect even light distribution because there is a difference in the intensity of each image caused by refraction and reflection due to the fact that the beam splitter and light guide are round. In the meantime, the stereo matching point is a main factor in deciding a pixel's intensity in the right and left images [2]. Uneven light distribution leads to decreased correctness in stereo matching, and so the image is calibrated by subtraction from a white reference image. Stereo images from this system show a disparity map converted to intensity from the disparity of matching point through stereo matching. Disparity can be calibrated in the process of researching the right image based on left image because the similarity between corresponding points of the two images should be judged in advance. We obtained the disparity map through a graph cut that is known to give excellent results [13–15]. Our results show that we can select a baseline after calculating the disparity map based on in vivo multiple-baseline stereo images of skin wrinkles using a non-convergence camera system and comparing the results from each baseline. As mentioned above, the relationship between disparity and depth information is nonlinear, and therefore, to deliver correct 3D information, we must first accomplish stereo matching and calibrate the disparity map properly. Knowing the relationship between the two factors in a correct figure is indispensable. Accordingly, we build a standard scalar bar as in Fig. 2(a). There are 20 lines, drawn by a laser, on both sides of an aluminum plate of 1000 μm thickness. The distance from the ends to the center is divided into 50 μm increments to show differences of depth. Fig. 2(b) is an overlap image of a stereo image taken using an in vivo multiple-baseline system on the left side of the standard scalar bar. When we assume that the line showing the least depth between the object and the camera is at 0 μm, we can find the difference in disparity to the line at 1000 μm. In other words, we can confirm that the disparity increases as the line comes closer to the camera line. In the right and left images photographed by this apparatus, the disparity between corresponding points is calculated using pixel counting. With this result, the formula relating disparity and depth can found by curve fitting. We can expect the 3D information to be closer to actual data by combining this formula and existing results. 3. Results

Fig. 1. Schematic for the main apparatus of the in vivo multiple-baseline stereo imaging system.

Fig. 3(a) is a picture of living skin obtained using an in vivo multiple-baseline stereo imaging system. The stereo image was obtained by positioning the main body close to the living skin. The cameras were set to produce TIFF images with a resolution of 1032 × 779 pixels at 24-bit color depth and the best available image quality. The field of view (FOV) adoptable to obtain the image was 5 × 3.8 mm at minimum and 80 × 60.4 mm at maximum. Fig. 3(b)–(c) is in vivo stereo images of skin wrinkles obtained by this system. Fig. 3(b) is the overlap image of the right and left images obtained when baseline was 0 (OL = OR) by aligning the optical axes of each camera. It shows a clear skin feature that seems to come from one camera, although two images are actually overlapped. This shows that this system can obtain in vivo images without being affected by minute movements caused by breathing or other movements by obtaining right and left images simultaneously. In Fig. 3(c), which was obtained by changing the baseline to 76.2 μm, we see a blurred image caused by the disparity brought about between the right and left images. In other words, this shows that this system can obtain stereo images at the desired baseline. However, it is difficult to judge the depth of skin wrinkles with

4842

O. Lee et al. / Optics Communications 283 (2010) 4840–4845

Fig. 2. Standard scalar bar: (a) construction of the standard scalar bar; (b) overlap images of the standard scalar bar photographed by the in vivo multiple-baseline stereo imaging system (baseline: 25.4 μm, FOV : 20 × 15 mm).

only a single image, so we express depth information as intensity through stereo matching. Fig. 4 is the result of the disparity map after obtaining stereo images for skin wrinkles from multiple-baseline conditions and doing stereo matching. As the camera used to obtain the right image is fixed, the right image is selected as the base image (Fig. 4(a)). The images from (b) to (f) are the results of stereo imaging by broadening the baseline (1B ~ 5B) at intervals of 25.4 μm. Though they are stereo images obtained from the same ROI, we see that we can get different results depending on the baseline. Looking at image (b) (1B), we can guess that the center of the ROI is close to the camera and the right and left images are comparatively far away from the camera. We can also see that the slant of the skin's surface in the right side of the ROI is greater than in other spheres. For a more detailed understanding, this technique marks parts of spheres showing wrinkles relatively well due to the slight slant of the skin surface as ROI ‘1’ and enlarges the sphere with 220 × 220 pixels. This technique also marks parts of spheres in which wrinkles are far from the camera due to the abrupt slant of the skin's surface as ROI ‘2’ and positions it at the right side after enlarging it. Though ROI ‘1’ shows wrinkles relatively well in (b), ROI ‘2’ cannot show wrinkles exactly and expresses mostly surface slant because the slant of the body surface is bigger than the wrinkle. In other words, it can be difficult to confirm wrinkles in spheres with large depth through the disparity map. Contrary to this, we can confirm wrinkles more precisely in ROI ‘2’ as the baseline increases (2B, 3B). However, as the baseline increases even more (4B, 5B), the correctness of stereo matching deteriorates and the results worsen. In other words, we must select the proper baseline to find minute

Fig. 3. The in vivo multiple-baseline stereo imaging system and an overlap image of skin wrinkles obtained by the system (FOV: 20 × 15 mm): (a) living skin image obtained by the in vivo multiple-baseline stereo imaging system; (b) overlap image (baseline: 0 μm); (c) overlap image (baseline: 76.2 μm).

wrinkles, which we can do through this system as it allows us to control the baseline. Fig. 5(a)–(b) is the result of curve fitting using the standard scalar bar at 2B and 3B, respectively, and they express the wrinkles of Fig. 4 (a) comparatively well. Eighteen lines to describe the disparity of the right and left image pixels are marked on the x axis and the actual

O. Lee et al. / Optics Communications 283 (2010) 4840–4845

4843

Fig. 4. The results of stereo matching for skin wrinkles in accordance with the change of baseline (FOV: 20 × 15 mm): (a) base image; (b) 1B: 25.4 μm (0.001 inch); (c) 2B: 50.8 μm (0.002 inch); (d) 3B: 76.2 μm (0.003 inch); (e) 4B: 101.6 μm (0.004 inch); (f) 5B: 127.0 μm (0.005 inch).

depths matching the lines are marked on the y axis. Though the total number of lines in the image is 20 each, we only use 18 because of occlusion in accordance with the baseline condition. To apply this to existing results, we create the data from the same 20 × 15 mm FOV. As shown in Eq. (1), depth and disparity are nonlinear to one another. We find that this shows to deliver a value including error such as these, when we want to know the depth information of object as 3D value. Meanwhile, precision increases as the baseline increases, so the range of disparity to be expressed in the image also increases (five disparity ranges at 2B, 6 at 3B). However, we expect that this difference is not large, because no large differences in baseline were found. Fig. 5(c)–(d) is the result of calibration by applying the formula of Fig. 4(a)–(b) to the results of Fig. 4(c)–(d). In other words, using curve fitting gives results that are closer to actual data. 4. Discussion Though the increase and decrease of baseline bring about problems in precision and correctness, occlusion change is also an

important factor [2,16,17]. As the sphere of occlusion, seen from one side but not from the opposite side, increases proportionally to the baseline, it is a problem to be rectified at the time of 3D reconstruction [2,16,17]. As the skin wrinkles used for the objects in this study generate relatively less occlusions than a general stereo image of an object having large depth differences, they are good stereo image candidates. A non-convergence camera model is expected to improve the correctness of stereo matching and to handle images quickly [2,5]. Convergence stereo images can be expected to improve correctness through rectification, but it is not a suitable process for medical applications because the images may be crushed. Accordingly, a nonconvergence model is a suitable system for the purposes of this study. Though conventional systems have baselines of a few centimeters, the distance between two eyes, is available for paralleled camera structure, the same structural method to need tens of micrometer of minute baseline cannot be selected to be high powered imaging system. To overcome this problem, a single lens camera system with a micro-prism array is one alternative, but, with this approach, baseline

4844

O. Lee et al. / Optics Communications 283 (2010) 4840–4845

Fig. 5. The relationship between disparity and real depth calibrated by the standard scalar bar at 2B and 3B, respectively, and the disparity map calibrated using this information: (a) 2B: 50.8 μm (0.002 inch); (b) 3B: 76.2 μm (0.003 inch); (c) 2B: 50.8 μm (0.002 inch); (d) 3B: 76.2 μm (0.003 inch).

control becomes difficult [18–20]. Our approach, making the camera obtain images divided by a beam splitter at a right angle, is simple and easy to implement. The stereo matching method in this system uses various algorithms. Area based methods reveal matching errors in the area with less texture or repeated texture and in the surrounding areas around the line of discontinuity of depth. Area based methods may yield incorrect results when the image is noisy [13,21]. Disparity maps should express smooth disparity values against continuously curved surfaces depicted in an image, as well as a clear boundary. However, when using a window of fixed size, it is difficult to satisfy these two conditions at the same time. In other words, when smooth disparity values are expressed well, the boundary of the curved surface becomes unclear. To visualize a clear boundary, it becomes necessary to accept a great deal of noise in the rest of the image. These problems may be ameliorated by using a large window to assess and reduce noise, and by using a small window to determine boundaries. These processes are also difficult to perform simultaneously. Many methods have been suggested to resolve this problem. They may be classified into two types of methods: adaptive window methods [21], which change the shape or size depending on the area that is visualized, and optimum solution methods, which globally apply known limits of continuity and uniqueness to the matching process [13–15]. Adaptive window methods measure changes in intensity in the window and changes in disparity that are acquired with every step, and renew the window's shape and disparity to yield minimum uncertainty. Although renewal regulations are well induced in the light of mathematics in the adaptive window methods, measured results are not improved over results derived by the fixed window method, because the algorithm is too sensitive to initial assumptions [21].

On the other hand, methods that prioritize finding an optimized solution in the context of global limitations usually yield correct matches. Some methods yield incorrect matches in areas with less texture, and problems that the section around the line of discontinuity of depth is crushed [13]. There are some area based algorithms, such as normalized cross correlation (NCC) [13], sum of squared differences (SSD) [13], mean of absolute differences (MAD) [7,13], and some energy function based algorithms, such as dynamic programming [17,22,23] and graph cut [13–15]. Energy minimization based on graph cuts establishes a directional, aggravated graph to minimize the energy function between two images and approaches the problem as a labeling problem. Existing dynamic programming limits the energy function to 1D, so it cannot be used effectively in 2D. However, the graph cut method, which is available for multi-dimensional functions, is known to bring better results than other methods [13–15]. Graph cut is a method that globally minimizes the energy function, which consists of a data term and a smoothness term, and yields good results in various engineering applications [13]. Images of skin wrinkles, like those used in this study, embody many typical challenges to stereo matching, because they involve repeated images, and because many lines of discontinuity are found in the images. Accordingly, by selecting a graph cut algorithm that optimizes the expression of disparity in stereo images, it is possible to identify the best optimized disparity map among all available options. Graph cut yields good results depending on the disparity range and lambda. For this study, we defined the disparity range as −15 and lambda as 10, which are values known to describe disparity maps well [14,15]. Recently, other values for the disparity range and lambda have been shown to yield better results in certain contexts, depending on texture [13]. Therefore, better quantitative analyses may be achieved by considering the characteristics of the living body under study.

O. Lee et al. / Optics Communications 283 (2010) 4840–4845

In the field of stereo vision, stereo matching is the main field of study. And various studies have looked at applying the techniques to the medical field [5,9–11]. Though the relationship between disparity and depth is nonlinear, calibration is not generally taken to be an important issue because doing so simplifies calculations. However, in quantitative medical applications, calibration is essential. We give new methods to calibrate the nonlinear relation between disparity and depth to apply to medical applications, which improves the data on real depth information. The method of comparing and verifying error rates against a ground truth for comparing the results of stereo matching is widely known [13]. However, in some instances, it is impossible to get good results through this method because a ground truth based on high powered stereo images of living skin does not exist. Therefore, an indirect method selected by dermatologists who are knowledgeable about the morphology of living skin, will give the best results. 5. Conclusion We designed a system to obtain stereo image in vivo using a nonconvergence camera model to make it possible to control baselines for medical applications of stereo vision. Additionally, we suggest a better method to approach real depth information by calibrating the nonlinear relation between depth and disparity. The system we propose not only overcomes problems caused by lack of calibration but also obtains surface images directly, and so can supply reliable data and is expected to save on expenses caused by calibration for replication, etc. Furthermore, by collaborating with other researchers, it can also be applied to more diverse analysis and applications as it can obtain in vivo images. Conflict of Interest The authors state that they have no conflicts of interest.

4845

Acknowledgments The generous support of MyoungKi Ahn at Korea Advanced Institute of Science and Technology (KAIST) is gratefully acknowledged. This work was also supported by the Seoul Research and Business Development Program (grant 10574) and by a Korea University grant (K0717401).

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23]

A. Fusiello, Mach. Vis. Appl. 12 (2000) 16. M.Z. Brown, IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 993. M. Okutomi, IEEE Trans. Pattern Anal. Mach. Intell. 15 (1993) 353. F. Matsuura, N. Fujisawa, J. Visual. 11 (2008) 79. J. Moon, Physiol. Meas. 23 (2002) 247. J.-S. Lee, C.-W. Seo, E.-S. Kim, Opt. Commun. 200 (2001) 73. K.-H. Bae, J.-S. Koo, E.-S. Kim, Opt. Commun. 221 (2003) 23. K. Iizuka, Appl. Opt. 44 (2005) 7083. M.G. Kim, J. Dermatol. Sci. 35 (2004) 125. S.W. Son, Skin Res. Technol. 11 (2005) 272. O. Lee, Int. J. Cosmetic Sci. 29 (2007) 227. J. Zhu, - Design and calibration of a single-camera-based stereo vision sensor Y1 - 2006, - Optical engineering M1 - Journal Article, - 083001. D. Scharstein, R. Szeliski, Int. J. Comput. Vis. 47 (2002) 7. Y. Boykov, O. Veksler, R. Zabih, IEEE Trans. Pattern Anal. Mach. Intell. 23 (2001) 1222. V. Kolmogorov, R. Zabih, Proceedings. Eighth IEEE International Conference on, 7-14 July 2001, vol.1, 2001, p. 508. C.L. Zitnick, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 675. I.J. Cox, Comput. Vision Image Understanding 63 (1996) 542. C.Y. Chen, Opt. Express 16 (2008) 15495. F. Matsuura, J. Visual. 11 (2008) 79. D.H. Lee, IEEE Trans. Robot. Autom. 16 (2000) 528. T. Kanade, IEEE Trans. Pattern Anal. Mach. Intell. 16 (1994) 920. Y. Ohta, IEEE Trans. Pattern Anal. Mach. Intell. 7 (1985) 139. P.N. Belhumeur, Int. J. Comput. Vis. 19 (1996) 237.