CHAPTER
Image Processing at Your Fingertips: The New Horizon of Mobile Imaging
8 Xin Li
Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
4.08.1 Historical background and overview Rapid evolution of sensing, computing, and communication technologies has redefined the way we collect, share, and interact with digital media. In particular, the art of electronic imaging (image sensing or acquisition) has witnessed two new trends in recent years. First, the miniaturization of imaging devices has facilitated their use in various mobile applications ranging from smartphones and tablets to wireless and sensor networks. Second, the integration of sensing with computing has given birth to several emerging fields such as computational photography, compressed sensing, and cyber-physical systems. These technological advances motivate us to take a fresh look at a conventional field such as image processing—namely, how will image processing evolve as the consequence of miniaturization and integration? The impact seems at least twofold. On one hand, mobile devices such as smartphones often have more severe power and bandwidth constraints than non-mobile ones. Therefore, it is desirable to allocate limited resources to preserve the image content of the highest interest—e.g., objects in the foreground should be given higher priority than those in the background. In fact, such an object-oriented concept has already been implemented by single-lens reflex (SLR)1 imaging in an optical way at the price of higher hardware cost. Given the low-cost of CMOS sensors, can image processing be combined with the interactivity of smartphones to facilitate the realization of this concept under the framework of mobile imaging? On the other hand, despite the success of multi-touch and pressure-sensitive interfaces with large touchscreens, their potential on small touch devices remains unleashed. At the intersection of mobile computing and human-computer interaction, there still exists plenty of “no-man’s land” to be explored. The long-term objective along this line of reasoning is to think of mobile imaging as an extended sensing platform and to enrich one’s sensing capability through tapping into the network of peer users. As the first step, we propose that image processing at the fingertips is a timely topic deserving our community’s attention. Why is it important? We argue that recasting image processing under a mobile imaging/computing framework could shed new insights to various analysis and restoration tasks that can benefit from the interaction between human and mobile devices. Meantime, new technical challenges related to small size of display screen, limitation of battery power and constraint on viewing conditions 1
http://en.wikipedia.org/wiki/Single-lens_reflex_camera.
Academic Press Library in Signal Processing. http://dx.doi.org/10.1016/B978-0-12-396501-1.00008-X © 2014 Elsevier Ltd. All rights reserved.
249
250
CHAPTER 8 Image Processing at Your Fingertips
imply plenty of opportunities for engineers to optimize their design and implementation. In particular, joint optimization of imaging and processing in a close-loop fashion is likely to play an increasingly important role in various emerging fields including computational photography and cyber-physical systems. In this note, we will first review the two threads of technological evolution: mobile imaging and mobile OS in Sections 4.08.2 and 4.08.3. They have followed different evolutionary paths but shared many similar patterns such as the co-evolution with microelectronics and networks. Then we will elaborate on the implications of intertwining these two threads in Section 4.08.4. At their intersection, there exist plenty of fascinating opportunities for new developments and applications of image processing at the fingertips. A sampler of applications will de discussed in Section 4.08.5 and we will address several open issues in Section 4.08.6.
4.08.2 Mobile imaging: following Feynman’s idea on infinitesimal machinery “There’s Plenty of Room at the Bottom” is the title of a lecture [1] given by the legendary physicist Richard Feynman at an American Physical Society meeting held at Caltech on December 29, 1959. At least two prophecies made by Feynman have come true: write-small (nanoscale lithography) and miniaturization of computers (VLSI technologies). Although the problem of manipulating and controlling image sensors on a small scale did not appear in the original list of Feynman, it has followed an evolutionary path similar to other technologies—the competition between charge-coupled device (CCD) and complementary metal oxide semiconductor (CMOS) technologies in the past four decades represents the challenge and promise of miniaturizing image sensors. CCDs and CMOS imagers were both invented in the late 1960s to early 1970s. However, their growth patterns have varied significantly as pointed out in [2]—CCD has dominated the first 25 years and CMOS only started to catch up since mid-1990s after the break of bottleneck caused by lithography technology. The increasing market share of CMOS sensors in the recent five years has been largely attributed to the growing popularity of smartphones. Therefore, we deem it a good lesson to learn from the winding roads of CMOS sensors (as compared against the straight path of CCD sensors—more details can be found at http://www.teledynedalsa.com/corp/markets/ccd_vs_cmos.aspx). The moral of this story is that like many other technology sectors, the co-evolution of imaging and microfabrication offers a more reliable prediction than an isolated point of view. Why did CCD win at the start? According to [2], the competing edge of CCD sensors is largely due to their higher image quality (as measured by signal-to-noise-ratio)—CMOS image sensors required more uniformity and smaller features than silicon wafer foundries could deliver at the time. The excellent noise performance of CCD make it less demanding for engineers to deal with than CMOS as constrained by the fabrication technology available in 1960–1970s. In fact, the design of CCD sensors is relatively so straightforward that it can be viewed as an over-simplified solution to the CMOS design (refer to [2]). The advantages of CMOS over CCD including lower power consumption, camera-on-a-chip integration (e.g., [3]), and lowered fabrication costs from the reuse of mainstream logic and memory device fabrication were not technologically feasible until 1990s when lithography and process control of microfabrication have reached their maturization.
4.08.3 Mobile Computing: Interacting with Computer without an Interface
251
How did CMOS catch up? As mentioned above, it was originally limited by lithography—a technology of writing—at the atomic scale. Earlier lithography technology relied on chemical processes for printing; photolithography marks the recent advance in using light to control the process of microfabrication [4]. It is interesting to point out the key milestone of photolithography technology is related to the use of light sources. Before 1982, lamp-based photolithography was the mainstream but it could not catch up with the semiconductor industry’s demands for higher-resolution and higher throughput. The pioneering invention and development made by IBM researchers in 1982 [5]—namely Excimer Laser Lithography—has delivered what is required by the semiconductor industry. From the sensor technology perspective, excimer laser lithography also plays the crucial role in revitalizing CMOS imaging since mid-1990s. It should be noted that CCD and CMOS will remain as complementary solutions in the future. For example, CCD remains the favorite solution to high-end or professional-grade photography (e.g., expensive single-lens reflex cameras); while CMOS becomes more suitable to low-end or consumergrade photography (e.g., cost-effective wedcams or smartphone cameras). It has been shown that CCD and CMOS sensors have converged to practically indistinguishable solutions to low-resolution imaging (VGA and below) [6]; while there is still a lot of room for the development of CMOS sensors for highresolution imaging. Among various novel ideas, we choose to highlight the solution based on microlens [7]—a device used to redirect light to the active area in the pixel and the principle of plenoptic imaging [8] (e.g., Plenoptic camera with an array of 25 micro-camera developed by Pelican Imaging2 ). It is difficult to predict how fast the science of miniaturization will evolve in the new century.
4.08.3 Mobile computing: interacting with computer without an interface Social interaction plays the essential role in the fabric of human societies. The advance of computing and communication technologies enable people to interact with each other anytime and anywhere. A key element to such computer-based social computing/communication infrastructure is the interface between humans and computers/networks. From command lines to graphic interfaces and from sketch-pad to multi-touch screen, the user interface facilitates the interaction between humans and machines. It is more like an art than science because the goodness or badness of an interface often varies from individual to individual. The success of any product such as a mouse or a tablet is often jointly determined by various factors (e.g., marketing strategy, human psychology and cultural background) more confounding than technology itself. Historically, human-computer interaction (HCI) has experienced several phases: text-based (before 1980s), graphics-based (1990–2000s) and touch-based (since 2005). Their evolutionary path can be understood from several complementary viewpoints. The trend is to interact with the computer without a mechanical interface—from a physical keyboard to a virtual one, from carrying a mouse to using one’s own fingers. Such trend of design is in principle aligned with the idea of multi-tool (e.g., swiss army knife and smartphones) and Feynman’s tiny machines because space limitation is a constant constraint to all technology-based gadgets. The design often sacrifices certain features (e.g., the speed of typing or 2
http://www.pelicanimaging.com/.
252
CHAPTER 8 Image Processing at Your Fingertips
precision of control [9]) for the other benefits such as mobility. We will take multi-touch as a concrete example to understand such a design tradeoff. The idea of touchscreen dated back to as early as early 1970s [10]; the concept of multi-touch has also been studied by both academia (e.g., Univ. of Toronto [11]) and industry (e.g., Bell Labs [12]) since early 1980s. One of the early breakthroughs was Pierre Wellner’s digital desk developed at the University of Cambridge [13]. Other notable multi-touch systems developed in the past decade include Diamondtouch developed by Mitsubishi [14] and the system developed by Han developed at NYU [15]. No matter if you like the Android or Symbian or iPhone, the leverage of multi-touch technology into mobile devices has announced the new era of mobile computing. There has been increasing evidence that software/apps development for mobile platform is going to take over those for desktops. So why do people like to tap or press or slide? On one hand, touch-based interface offers new experience for users—the curiosity of learning is the driving force underlying the growth of many new technologies including smartphones. It is interesting to compare mouse-clicking and screen-tapping— even though the former is more precise, the latter implements a shortcut between humans and machines— there is no need to look for a mouse or its connections any more. Meantime, touchscreen admits a rich set of extending modes beyond tapping—namely press, tilt, scratch, shake, slide (single-finger vs. doublefinger). Those fancy finger movements—when imposed under a proper application context (e.g., gaming vs. reading)—open a door to create novel and exciting experiences. On the other hand, touchscreen is supported by many other desirable features of mobile devices including their portability, the ease of turning on/off and software distribution. As the population of smartphone users keeps increasing, social networking applications are likely to more tightly tie with mobile platforms. Under this context, there exist plenty of opportunities for mobile imaging and image processing at fingertips to further enrich the experience of social networking.
4.08.4 Image processing at fingertips: where mobile imaging meets mobile computing Rapid advances of sensing and interface technologies enables us to take a fresh look at several conventional image processing tasks including acquisition, segmentation, mosaicing and restoration. The novel insight comes from how user feedback can improve the accuracy of manipulating images or enhance the experience of interacting with the image content.
4.08.4.1 Intelligent image acquisition In conventional wisdom, image acquisition is largely viewed as a disconnected component from the pipeline of an image processing system. The quality of acquired images is primarily determined by the hardware—e.g., CMOS sensors embedded into smartphone cameras are often viewed as of inferior quality to the CCD sensors used by single-lens reflex (SLR) cameras. However, such view can be challenged from several perspectives such as the definition of quality and the scope of application. Smartphone-based image acquisition can be made more intelligent along the same lines of reasoning as the emerging field of computational photography [16]—i.e., a hybrid (hardware + software) approach
4.08.4 Image Processing at Fingertips
253
is more fruitful and could reshape our thinking about sensor design. In this subsection, we will review three scenarios where a hybrid approach can shed novel insights to various image-related applications. First, the quality of acquired images can be defined not under the context of human visual inspection but that supporting vision-related applications (e.g., barcode scanner by a smartphone). It is important to note that when understood at the system level, what truly matters to many vision-related applications is whether the image quality is sufficient to support various recognition tasks. In an image processing system designed by the reduction principle, such an objective is difficult to meet because the sensor part has little knowledge about the recognition part (it is often called open-loop in the literature of control theory). A more conceptually appealing approach is allowing the sensor component to adjust itself based on the feedback from the recognition component (so-called close-loop design). How does such adjustment help? Several smartphone applications such as RedLaser and Scan allow users to move the camera back and forth so the image content of interest such as barcode or Quick Response (QR)-code can be acquired in focus and recognized by the prioritized barcode or QR-code recognition algorithm. We argue that such close-loop design is a significant advantage supported by mobile imaging where mobility can be put into better use at the system level. Second, we believe intelligent image acquisition implies not only higher quality but also better security and privacy features. Security and privacy related applications involve the authentication of image content, copyright protection and so on. It has been long recognized that the nature of digital images make them vulnerable to copy, manipulation and distribution in the virtual world. In addition to various existing technological means (e.g., watermarking and forensic analysis), smartphones offers a new opportunity of enhancing the security/forensics features of mobile imaging—namely, the spatial location of smartphones recorded by the GPS sensor, as well as the time capsule information become the unique spatio-temporal signature. Such authentication collected at the physical layer—after proper protection such as cryptography—could offer a powerful line of defense against unauthorized distribution or malicious manipulation of image content. We expect that such a line of research could become more fruitful under the context of social networking (e.g., privacy protection and identity management). Third, a hybrid approach toward mobile imaging can support the acquisition of photos with artistic flavor. A most common example is to slow the shutter speed for the purpose of creating a light trail in the image or an intentionally motion-blurred image (with artistic effect). Several iPhone applications (such as Slow Shutter Cam and WaterMyPhoto) have attracted wide attention and inspired the development of new photography applications on iOS platform. When compared with the main stream of computational photography, we believe artistic rendering has not received sufficient attention. In parallel, to extend the capabilities of digital photography by a computer, exploitation of motion-clues (both object-related and sensor-related) could offer refreshing ideas about how visual thinking could support reasoning.
4.08.4.2 Interactive image matting Matting—a classical cinematography technique used to composite the foreground element into a new scene—has been widely used in the creation of special effects [17]. The technique of digital matting was first studied by Porter and Duff in [18]. The underlying mathematical model or so-called composition equation is given by C = α F + (1 − α)B, where 0 < α < 1 denotes the pixel’s opacity component and C, F, B correspond to the pixel’s composite, foreground and background colors respectively. It is often assumed that some anchor regions of foreground and background are available as
254
CHAPTER 8 Image Processing at Your Fingertips
a priori knowledge (e.g., supplied by the user) to facilitate matting. One of the earliest principled solutions to digital matting is based on a Bayesian approach [19]; since then, several alternative approaches (e.g., Poisson matting [20], spectral matting [21], closed-form solution [22]) have been proposed in the literature. Despite their subtle technical differences, the fundamental assumption about the anchor regions of foreground/background is required and the quality of such anchor regions is often crucial. Smartphone-based mobile imaging makes it convenient to develop the protocol of anchor region selection. After pressing the button of acquisition, one can proceed with the selection of anchor regions by dancing his/her fingers. For example, one might slide two fingers in opposite direction to define a rectangular region (e.g., such anchor region can be used by GrabCut [23]) or apply varying pressures to tap the objects in the foreground and background (e.g., such anchor region can be used by spectral matting [21]). Furthermore, one could even supply rough contour tracing results by drawing consecutive line segments on the screen (e.g., such anchor region can be used by LazySnap [24]) or refined anchor regions by tapping on the zoomed image (i.e., to facilitate the user’s interaction). Figure 8.1 includes several examples of anchor regions selected by different touch protocols; a thirdparty implementation of interactive segmentation toolbox (including GrabCut and LazySnap) is available at http://www.cs.cmu.edu/˜mohitg/segmentation.htm. We note that the accuracy of any image matting technique is determined by how the theoretical matting model matches the image data in practical acquisition. None of the published matting algorithms can achieve error-free segmentation results on all test images. Therefore, it becomes plausible to design a verification protocol so the user can refine/polish the result of image matting iteratively. Such line of reasoning would lead to the design of image matting on mobile devices in a closed-loop fashion—the key idea is to bring the user into the loop and supply valuable feedback information to the matting algorithm. We believe such paradigm of interactive image matting is of interest to both image processing community and human-computer interaction researchers. Existing iPhone application such as Color Effects appears to represent a first attempt along this direction; however, its object segmentation functionality is far from being optimized (partially due to complexity constraint). It is expected that image matting with user feedback can more easily lends itself to mobile tablet applications (e.g., iPad or Galaxy) with larger touchscreens and more computational resources.
4.08.4.3 Dynamic image mosaicing Image mosaicing refers to the stitching of several images of the same scene together so one can obtain a panoramic view [25]. Homography estimation—the process of deriving the geometric transformation relating one image to another—is at the foundation of all practical mosaicing techniques. A homography is essentially a 2D planar projective transform that can be estimated from a given pair of images. Depending on the nature of scene geometry and camera motion, its corresponding transform matrix A3×3 could admit 3, 6, and 8 degrees of freedom for rigid, affine and projective transformation respectively. To estimate transform matrix A3×3 , it is often desirable to have a sufficient number of corresponding feature points from a given pair of images [26]. In the literature of computer vision, the problem of finding corresponding feature points has been extensively studied in the past 20 years. The scale-invariant feature transform (SIFT) [27,28] and its variants (e.g., SURF [29], affine-SIFT [30]) have been widely used in various applications including image mosaicing.
4.08.4 Image Processing at Fingertips
(a)
(c)
255
(b)
(d)
FIGURE 8.1 Examples of producing anchor regions on touchscreens: (a) sliding thumb and index defines a rectangular foreground; (b) tapping with varying pressure defines foreground (color-coded by yellow) and background (color-coded by green); (c) finger sliding generates a rough contour of object; (d) refining the definition of F/B by tapping on zoomed regions. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this book.)
However, it should be noted that there is still a significant gap between those artificial algorithms (the best engineering endeavor) and human vision system (nature’s evolutionary end-result). Therefore, it would be desirable to exploit the interaction between human and computer to improve the accuracy of feature point correspondences. A MATLAB tutorial of image mosaicing based on simple user feedback can be found at http://www.pages.drexel.edu/˜sis26/MosaickingTutorial.htm. Figure 8.2 shows two possible protocols for users to specify corresponding feature points on the touch-screen. Tapping or sliding the fingers is arguably more convenient on a mobile device with a relatively large display screen (i.e., ipad instead of iPhone). However, it is also possible to automatically find a pool of potential matching points (e.g., by SIFT) and ask a user to only control the threshold parameter by sliding his
256
CHAPTER 8 Image Processing at Your Fingertips
(a)
(b)
FIGURE 8.2 (a) User specifies corresponding feature points by tapping on touch-screen of a tablet (the first and second pairs are color-coded differently); (b) an alternative way of specifying corresponding anchor regions by sliding thumb and index fingers (red rectangles indicate the selection results after sliding). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this book.)
finger on a virtual display bar; alternatively, a mobile device can display all matching points and ask the user to select those visually more plausible (e.g., slide through the lines connecting correspondent SIFT keypoints). We believe that mobile imaging could redefine image mosaicing by enriching the user’s interactive experience. For example, it has been proposed that the user can use either the accelerometer of a smartphone or finger movement to control the view angle toward the panoramic image (please refer to Figure 8.3). We can also imagine that functionality of streetview offered by Google Earth can be implemented on a mobile platform so a user can not only vary the viewpoint but also wander around the point of interest (including zoom in or out). As demonstrated by the same authors of ztitch, interactive weaving of video content (called veaver) could deliver a richer set of multimedia experience than still image-based only (e.g., moving objects across different views could be stitched together and played in synchronization).
4.08.4.4 Supervised image restoration Image restoration refers to the recovery of an image from its degraded version. Depending on the degradation model, image restoration includes inpainting, deblurring, denoising, and so on. In the past, image restoration research has been primarily focusing on finding good prior models for photographic images and deriving so-called regularized restoration algorithms. However, in many practical scenarios related to image restoration such as cultural heritage preservation and personal photo repairing, identification of degraded regions is a difficult task for computer vision. For example, despite the abundance of published papers on image inpainting (e.g., [31–34]), it is often assumed that the inpainting domain is given as a priori; most existing works on motion deblurring (e.g., [35–38]) assume a global (spatially-invariant) blur model and therefore cannot handle the blur caused by object motion.
4.08.4 Image Processing at Fingertips
257
FIGURE 8.3 An interactive image stitching application ztitch developed on Windows Phone 7 by Andrew Au and Jie Liang at Simon Fraser University. Click the video files Veaver, Ztitch1 and Ztitch2 to see the demonstrations.
Human-computer interaction can facilitate the task of image restoration in several complementary manners. First, humans can identify degraded regions and mark them as shown in Figure 8.4. One might wonder if we should pursue an automatic approach toward this objective; we argue that the definition of “problematic regions” is context-dependent and varies from application to application. Therefore, it is more desirable and efficient to bring humans into the loop than to enumerate all possible degradation models. Second, humans can supply valuable clues to help machines with the restoration task. As shown in Figure 8.4b), line constraints along the edges of a building can be conveniently highlighted by a user and facilitate the task of inpainting [39]; user-specified region to search for similar patches can also greatly reduce the computational complexity of various patch-based inpainting techniques. Second, touch-based control can allow a user to adjust the exposure time while focusing with a second finger or selectively focus on the object in the foreground. Although the standard camera on smartphones does not have the capability of zoom as SLR cameras do, the rapid advances of interpolation or super-resolution (SR) techniques [40–43] have significantly reduced the performance gap between optical and digital zooming. The stabilizer—insensitivity to the shaking of hands—is also facilitated by a computational camera. One can either take a sequence of pictures consecutively (so-called burst mode) or control the shutter speed (e.g., coded-aperture). The accelerometer equipped with a smartphone could also offer relevant tilting information to a motion deblurring algorithm (e.g., [35]). Third, the idea of putting restoration-and-recognition in a single loop has received increasingly more attention in recent year [44,45]. As mentioned before, putting image acquisition and recognition in a loop has shown promising results in barcode-related applications. It is natural to extend this idea further by putting restoration into the loop. For example, we can design a specific SR technique for 1D
258
CHAPTER 8 Image Processing at Your Fingertips
(a)
(b)
(c) FIGURE 8.4 Examples of interactive image restoration: top—inpainting domain is marked by magenta (line constraints are marked by R/G/B); bottom—the region suffering from local motion blur can be finger picked on a touchscreen. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this book.)
barcodes [46] or particular blind deconvolution algorithm for optical blur that can potentially reduce the scanning time. The conventional face recognition problem could also benefit from such new ideas of restoration-and-recognition in a loop—namely, under the assumption that a face recognition algorithm is chosen, how to exploit the mobility of sensor to maximize its performance? In contrast to softwarebased approaches such as [45], mobile imaging offers an attractive hardware-based alternative solution with little computational burden on the computing device.
4.08.5 Applications
259
4.08.5 Applications In this section, we discuss a range of potential applications at the intersection of mobile imaging and computing. The new philosophy of designing novel applications on the mobile platform is twofold. One is similar to the principle of miniaturization—there is plenty of room to explore at the bottom of the scale; the other is the principle of mobility—how does this new feature reshape our thinking about the tradeoff between resource limitation and quality delivery. We hope this section can motivate both researchers and developers to explore further along the road of image processing at fingertips.
4.08.5.1 Mobile photography Dozens of applications under the category of photography (e.g., Camera+, 360 Panorama, Camera Genius, and Slow Shutter Cam) have been developed on the iOS platform. Many functionalities mentioned above such as finger-controlled image acquisition, panoramic image mosaicing, and motion-based artistic effect have been at least partially implemented by those applications. A more fundamental issue that we attempt to address here is: how will the mobile imaging/computing platform impact the field of photography? It is our conviction that a good vision at the high level often offers useful guiding principles for developing killer applications. First, the gap between professionals and amateurs could diminish substantially. Historically, the finest art of photography has been mastered by only a small portion of professionals who have access to expensive equipment and educational program. As the technology advances, the barrier to high-quality equipment has been removed enabling millions of amateurs to become artistic photographers on a daily basis. The new generation who grow up with mobile devices such as iPhone and iPad are likely to have a better intuition about photography than our ancestors. New applications helping amateurs to master the art by casual playing instead of formal training could reshape the society’s perception about the profession of photography. Second, commercial photography will operate on a unprecedented scale but face new challenges (e.g., copyright and privacy protection). Advertisement, fashion, food, landscape, and wildlife photography are the examples of old-fashioned commercial uses of photography. As the gap between professionals and amateurs decreases, a new business model could appear—namely, anyone can get paid by taking a good picture at the right place and time (e.g., tornados, floods, etc.). As a concrete example, little things such as taking a passport photo will not hold any commercial value; while the operational cost of photojournalism (e.g., National Geographic) could be dramatically reduced. New apps supporting such paradigm shift is likely to gain momentum. Third, mobile photography might become an indispensable tool in forensics. Forensic photography is a technical challenge largely due to the location uncertainty of crime scenes. Video surveillance evidently addresses this challenge to some extent; but mobile photography offers a valuable supplement. People already use smartphones to collect evidences for car accidents and vandalism; cameras installed on traffic lights have the potential of catching red-light beaters or illegal U-turners. Any new apps devoted to high-quality acquisition of objects of interest (e.g., human face or vehicle license plate) in a timely manner could be useful to law enforcement.
260
CHAPTER 8 Image Processing at Your Fingertips
4.08.5.2 Computer vision and pattern recognition Traditionally barcode scanners were designed and adopted by stores to help customers with price-finding or self-checkout. The barcode/QR code scanning and recognition functionality provided by smartphones has made it convenient to compare the prices on the web or self-check in at the airport. A conceivable useful application along this line is to “integrate” various barcode-based ID cards into a smartphone device. Many competing grocery stores, game stores, and public library require their customers to present the membership cards before receiving service or promotion. The apparent disadvantages of carrying those plastic cards include too-many-to-carry, inevitable physical damage, and easy-to-lose. It will be highly desirable to store a variety of barcode-based membership information into a single device such as a smartphone. How to assure that the barcode image stored by a smartphone can be recognized by conventional devices (e.g., laser-based) is a technical challenge for the future. Another collection of tools facilitated by mobile imaging are related to optical character recognition (OCR). Historically many OCR-related techniques have been studied and developed to facilitate various applications including tax return, law enforcement, digital library, and so on. The mobility of smartphones enables us to turn information content in the physical world into an image format (e.g., printed phone-number or bus schedule, presentation slides at a conference, handwritten recipe at a friend’s house). If the text information acquired in pictorial format can be converted back to textural, it will save both space and make it convenient for editing and searching. Some iPhone applications (e.g., ImageToText developed by Ricoh) have received favorable reviews due to their ease of usability and good accuracy of conversion. If someone could develop a similar application converting the image of line drawings to its original format, that will be another cool application. Biometrics such as fingerprint or face scanners have been conceived as the desirable tool to protect the security of smartphones. The hardware of smartphones can easily support the acquisition of fingerprint and face images. What is still lacking seems to be on the software side—namely, robust and efficient recognition algorithm that can operate on a light-weight Android or iOS system. How to exploit the tradeoff between recognition performance and computational complexity to support mobile computing deserves the attention from both biometrics and embedded system communities. In particular, how to make the best use of a hybrid approach (similar to barcode recognition) under the context of biometrics recognition appears to be an under-researched problem.
4.08.5.3 Human network interaction Social networks have made their way into our lives through facebook, linkedin, twitter, and Google+. The mobility and communication capability of smartphones enable their users to interact with not only one machine but a network of smartphone users. What can we foster at the intersection of social networks and touch computing? We will sketch several proof-of-concept apps that can be viewed as novel ways of human interaction enabled by the network: 1. Mobile verification: Some smartphones are already equipped with the capability of scanning the fingerprint by a swipe sensor and using it as a biometrics to verify the user’s identity; other modalities such as face and voice are also potential candidates for biometric identity. It is our biased opinion that smartphones offer an attractive platform for testing biometrics. Just as 2D bar-codes are getting more popular, mobile verification of humans could bring commercial values.
4.08.6 Open Issues and Problems
261
2. Touch-to-know: Imagine at a conference, instead of staring at the name tag, you can use your smartphone to find out who is this guy in front of the poster. Take a picture of him/her; touch the picture and the smartphone will be smart enough to connect you through linkedin. The idea is to help people make more friends in a mobile way. 3. Who-is-nearby: GPS-based driving assistant system already has some capability of sharing traffic information among nearby drivers. Similarly, you might want to know whether your old friends are nearby at a conference or track where they are. Instead of supplying the coordinate of some place to GPS, you can command your smartphone to take you to where your friend is. Another important class of applications enabled by human network interaction is education. As the internet keeps growing, it has become an important educational resource accessible to everyone. To find out the answers, people can search on the web instead of in the library. Therefore, the design of interactive educational software could follow a similar pattern to that of computer games—i.e., from conventional platforms (e.g., PC, consoles) to mobile ones. Plenty of opportunities exist there—e.g., when a kid spots a plant or animal and curious about its name, he/she can take a picture and look it up on the web; or a child can learn a second language by touching the word in his/her native language; or any person can look up a word in a dictionary or wikipedia by simply tapping on it with varying pressure. As the visualization/display technology evolves, one can imagine in the long run that people can interact with the cyber-world by dancing their fingers not on the touchscreen but in the air—as in the movie “Minority Report.”
4.08.6 Open issues and problems Mobility has played a fundamental role in the evolution of human species. Therefore, it is not surprising to observe that mobile imaging and mobile operating systems (OS) represent the new trend of sensing and computing in the next decade. We believe that this new trend will have dramatic impact on the field of image processing—from keyboard/mouse on desktop to fingertips on tablets/smartphones. As the pioneer of cybernetics Norbert Wiener says, “the most fruitful areas for the growth of sciences were those which had been neglected as a no-man’s land between the various established fields.” It is likely that image processing at fingertips will occupy no-man’s land at the intersection of microelectronics, embedded systems, signal processing, computer vision, human-computer interaction, social networking, and software engineering. This article has only scratched the surface of this emerging area; many issues are still open and technical problems remain to be attacked. In this last section, we will supply a list of open issues and problems for future study. •
The impact of miniaturization: Many achievements of mobile devices would not have been possible without the advance in the art of miniaturization. Making things smallers is well aligned with the principle of mobility, which supports the integration at the system level. How do we make smartphone even smarter? It has already integrated phone, GPS, camera, LCD display into one device. What if it can become a more versatile device, so it can be also used in financial transactions (i.e., integrate smartcards into smartphones), personal identification (e.g., integrate your driver’s license to your phone), access control (so real estate agents do not need the password to retrieve keys), and cosmetic accessories (e.g., ladies do not need to carry a mirror for make-up because their smartphones can provide cosmetic parameter information)? In particular, if our phone can become our third eye—the
262
•
•
CHAPTER Image Processing at Your Fingertips
one that helps us see far, see small, see in the dark or even see through our bodies, such a device could become an indispensable component of our daily lives. Integrate sensing and processing: Historically, image acquisition and processing have evolved almost independently (one tied to hardware design and the other belongs to software development). For the first, mobile devices such as smartphones and tablets provide a platform supporting joint optimization of sensing and processing. The popularity of various barcode and QR code scanners is only representative of this remarkable trend. Many other image processing and analysis tasks could benefit from our advocated closed-loop design philosophy—it is not like fancy mathematics such as compressed sensing but more like practical engineering such as quality control. However, we argue its intellectual merit lies in being experimentally reproducible because the mobile device has to work with real-world image data and adapt to them through physical motion. Even though this is only the first step towards understanding the fundamental role of mobility in the evolution of intelligence, we argue it marks a significant advantage of advocating experimentally reproducible research. Support research via development: Conventional wisdom emphasizes that development comes after research because the latter is believed to be more fundamental. Unfortunately, the boundary between basic and applied research has become more and more vague; moreover, basic research is often thought of as a luxury under a stressed economical environment. Apple’s miraculous success in business might suggest an alternative path—starting from real-world applications and logically reason backwardly. What kind of tools do we need or need to invent to solve a particular problem? If Karl Popper’s falsification of scientific theory is correct, then experiments (or experimentally-reproducible research) could also suggest a fruitful approach towards creating new theories. Nevertheless, without our ancestors’ diligent observation data, it would have been much more difficult for even a genius like Issac Newton to establish his theory of classical mechanics. Maybe we—as engineers—can be more proud of our profession if we not only improve life qualities through inventions but perceive it as an experimental approach toward understanding how nature works.
A. Appendix: course material, source codes and datasets In this Appendix, we provide a list of links to various online resources related to image processing on fingertips. Since this is an emerging field, we expect that more resources will become available as the technology evolves. •
•
Teaching material collection: At this point, the course EE368 offered by Prof. Bernd Girod at Stanford seems to be the only one covering image processing on smartphones. Some useful background material related to Android OS can be found at http://www.stanford.edu/class/ee368/Android/. A list of course projects in previous years can be accessed from http://www.stanford.edu/class/ee368/Project_10/index.html. The well-known open-source project developed by vision community (OpenCV) is still migrating to the Android platform http://opencv.willowgarage.com/wiki/Android. Source code collection: A list of links to reproducible research in image processing, computer vision/graphics, and machine learning can be found at http://www.csee.wvu.edu/˜xinl/source.html. Additionally, the authors of ztitch software (Andrew Au and Jie Liang) have kindly made their source codes available online http://www.ztitch.com/source.html. Anther related open-source software developed for Windows Phone can be found at http://picfx.codeplex.com/.
References
•
263
Image dataset collection: A collection of links to various video/image databases on the web can be found at http://www.csee.wvu.edu/˜xinl/database.html. The most well-known application for sharing photos acquired by smartphones is likely to be Instagram http://en.wikipedia.org/wiki/Instagram. Online photo website flickr has created user groups with different iOS—e.g., http://www.flickr. com/groups/iphone/ and http://www.flickr.com/groups/android/.
References [1] R. Feynman, There’s plenty of room at the bottom, Eng. Sci. 23 (5) (1960) 22–36. [2] D. Litwiller, CCD vs. CMOS: facts and fiction, Photon. Spectra 1 (2001) 154–158. [3] E. Fossum, CMOS image sensors: electronic camera-on-a-chip, IEEE Trans. Electron Dev. 44 (10) (1997) 1689–1698. [4] M. Madou, Fundamentals of Microfabrication: The Science of Miniaturization, CRC, 2002. [5] K. Jain, S. Rice, B. Lin, Ultrafast deep UV lithography using excimer lasers, Polym. Eng. Sci. 23 (18) (1983) 1019–1021. [6] B. Carlson, Comparison of modern CCD and CMOS image sensor technologies and systems for low resolution imaging, in: Sensors, Proceeding of IEEE, vol. 1, IEEE, 2002, pp. 171–176. [7] A. Moini, Image sensor architectures, Smart Cameras, Springer, 2010, pp. 81–96. [8] R. Ng, M. Levoy, M. Bredif, G. Duval, M. Horowitz, P. Hanrahan, Light field photography with a hand-held plenoptic camera, Computer Science Technical Report CSTR, vol. 2, 2005. [9] H. Benko, A. Wilson, P. Baudisch, Precise selection techniques for multi-touch screens, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2006, pp. 1263–1272. [10] F. Beck, B. Stumpe, European Organization for Nuclear Research, Two devices for operator interaction in the central control of the new CERN accelerator, CERN, 1973. [11] N. Mehta, A flexible machine interface, Master’s Thesis, University of Toronto, Department of Electrical Engineering, 1982. [12] L. Nakatani, J. Rohrlich, Soft machines: a philosophy of user-computer interface design, in: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM, 1983, pp. 19–23. [13] P. Wellner, The digitaldesk calculator: tangible manipulation on a desk top display, in: Proceedings of the Fourth Annual ACM Symposium on User Interface Software and Technology, 1991, pp. 27–33. [14] P. Dietz, D. Leigh, Diamondtouch: a multi-user touch technology, in: Proceedings of the 14th Annual ACM Symposium on User Interface Software and Technology, 2001, pp. 219–226. [15] J. Han, Low-cost multi-touch sensing through frustrated total internal reflection, in: Proceedings of the 18th Annual ACM Symposium on User Interface Software and Technology, 2005, pp. 115–118. [16] R. Raskar, J. Tumblin, Computational Photography: Mastering New Techniques for Lenses, Lighting, and Sensors, AK Peters, Ltd., Natick, MA, USA, 2009. [17] R. Fielding, The Technique of Special Effects Cinematography, Focal Press, 1985. [18] T. Porter, T. Duff, Compositing digital images, ACM SIGGRAPH Comput. Graph. 18 (3) (1984) 253–259. [19] Y.-Y. Chuang, B. Curless, D.H. Salesin, R. Szeliski, A Bayesian approach to digital matting, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, 2001, pp. 264–271. . [20] J. Sun, J. Jia, C. Tang, H. Shum, Poisson matting, in: ACM SIGGRAPH, 2004, pp. 315–321. . [21] A. Levin, A. Rav-Acha, D. Lischinski, Spectral matting, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’07, June 2007, pp. 1–8. .
264
CHAPTER Image Processing at Your Fingertips
[22] A. Levin, D. Lischinski, Y, Weiss, A closed-form solution to natural image matting, IEEE Trans. Pattern Anal. Mach. Intell. (2007) 228–242. [23] C. Rother, V. Kolmogorov, A. Blake, Grabcut: interactive foreground extraction using iterated graph cuts, ACM Trans. Graph. (TOG) 23 (2004) 309–314. [24] Y. Li, J. Sun, C. Tang, H. Shum, Lazy snapping, ACM Trans. Graph. (TOG) 23 (2004) 303–308. [25] R. Szeliski, Image mosaicing for tele-reality applications, IEEE Comput. Graph. Appl. 16 (2) (1996) 22–30. [26] R. Hartley, A. Zisserman, Multiple View Geometry in Computer Vision, second ed., Cambridge University Press, 2003. [27] D. Lowe, Object recognition from local scale-invariant features, in: Proceedings of IEEE International Conference on Computer Vision, 1999. [28] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2004) 91–110. [29] H. Bay, T. Tuytelaars, L. Van Gool, Surf: speeded up robust features, in: ECCV 2006, 2006, pp. 404–417. [30] J. Morel, G. Yu, ASIFT: a new framework for fully affine invariant image comparison, SIAM J. Imag. Sci. 2 (2) (2009) 438–469. [31] M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester, Image inpainting, in: Proceedings of SIGGRAPH, New Orleans, LA, 2000, pp. 417–424. [32] A. Criminisi, P. Perez, K. Toyama, Region filling and object removal by exemplar-based image inpainting, IEEE Trans. Image Process. 13 (2004) 1200–1212. [33] J. Sun, L. Yuan, J. Jia, H.-Y. Shum, Image completion with structure propagation, in: ACM SIGGRAPH, 2005. [34] I. Drori, D. Cohen-Or, H. Yeshurun, Fragment-based image completion, in: ACM SIGGRAPH 2003, 2003, pp. 303–312. [35] R. Fergus, B. Singh, A. Hertzmann, S.T. Roweis, W.T. Freeman, Removing camera shake from a single photograph, ACM Trans. Graph. 25 (3) (2006) 787–794. [36] Q. Shan, J. Jia, A. Agarwala, High-quality motion deblurring from a single image, in: ACM SIGGRAPH 2008 Papers, 2008, pp. 1–10. [37] S. Cho, S. Lee, Fast motion deblurring, ACM Trans. Graph. (TOG) 28 (2009) 145. [38] O. Whyte, J. Sivic, A. Zisserman, J, Ponce, Non-uniform deblurring for shaken images, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 491–498. [39] C. Barnes, E. Shechtman, A. Finkelstein, D. Goldman, Patchmatch: a randomized correspondence algorithm for structural image editing, ACM Trans. Graph. (TOG) 28 (2009) 24. . [40] X. Li, M. Orchard, New edge directed interpolation, IEEE Trans. Image Process. 10 (2001) 1521–1527. [41] W.T. Freeman, T.R. Jones, E.C. Pasztor, Example-based super-resolution, IEEE Comput. Graph. Appl. 22 (2002) 56–65. [42] D. Glasner, S. Bagon, M. Irani, Super-resolution from a single image, in: ICCV, 2009. . [43] G. Freedman, R. Fattal, Image and video upscaling from local self-examples, ACM Trans. Graph. (TOG) 30 (2) (2011) 12. . [44] M. Gupta, S. Rajaram, N. Petrovic, T. Huang, Restoration and recognition in a loop, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 638–644. [45] Y.Z. Haichao Zhang, Jianchao Yang, T. Huang, Lose the loop: joint blind image restoration and recognition with sparse representation prior, in: IEEE Conference on Computer Vision, 2011. [46] F. Champagnat, C. Kulcsar, Le G. Besnerais, Continuous super-resolution for recovery of 1-D image features: algorithm and performance modeling, in: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol.1, 2006, pp. 916–926.