c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
journal homepage: www.intl.elsevierhealth.com/journals/cmpb
A wearable real-time image processor for a vision prosthesis D. Tsai a , J.W. Morley b,c , G.J. Suaning a , N.H. Lovell a,∗ a b c
Graduate School of Biomedical Engineering, University of New South Wales, Sydney, NSW 2052, Australia School of Medicine, University of Western Sydney, Sydney, NSW 1797, Australia School of Medical Sciences, University of New South Wales, Sydney, NSW 2052, Australia
a r t i c l e
i n f o
a b s t r a c t
Article history:
Rapid progress in recent years has made implantable retinal prostheses a promising ther-
Received 3 July 2008
apeutic option in the near future for patients with macular degeneration or retinitis
Received in revised form
pigmentosa. Yet little work on devices that encode visual images into electrical stimuli have
10 December 2008
been reported to date. This paper presents a wearable image processor for use as the exter-
Accepted 13 March 2009
nal module of a vision prosthesis. It is based on a dual-core microprocessor architecture and runs the Linux operating system. A set of image-processing algorithms executes on the dig-
Keywords:
ital signal processor of the device, which may be controlled remotely via a standard desktop
Embedded image processing
computer. The results indicate that a highly flexible and configurable image processor can
Retinal prosthesis
be built with the dual-core architecture. Depending on the image-processing requirements,
Bionic eye
general-purpose embedded microprocessors alone may be inadequate for implementing
Macular degeneration
image-processing strategies required by retinal prostheses.
Retinitis pigmentosa
1.
Crown Copyright © 2009 Published by Elsevier Ireland Ltd. All rights reserved.
Introduction
Several research groups are developing retinal prostheses as a potential treatment for retinal degenerative diseases such as retinitis pigmentosa and age-related macular degeneration [1–4]. Many proposed designs require an image-processing device for acquiring images from a camera and transforming them into stimulus commands to configure the stimulation to be delivered by the implant. Details of the epi-retinal prosthesis being developed in the authors’ laboratory have been reported previously [5]. In brief, a camera worn by a patient captures an image, sends it to a portable processor for decomposing into stimulus command sequences, which are then transmitted to an implanted device located within the eye via a transcutaneous radio frequency (RF) link. Electronics in the implant decode the signal and use the incipient energy of the transmitter to activate the microelectrodes (Fig. 1). It is clear
from the above that the image processor plays an integral role in converting the visual scene into a representation for modulating electrical stimuli. Such a device is not unique to the described epi-retinal prosthesis. Vision prostheses proposed by several other groups also necessitate the use of an image processor. Notwithstanding its importance, little work has been reported on the hardware and software for such devices. The design of such an image processor presents considerable challenges. The device should provide support for all essential activities that vision impaired subjects may need during their daily life, including: navigation [6], object recognition [7], facial recognition [8] and even reading [9–11]. However, vision prostheses of the near future are expected to contain only limited numbers of electrodes, thus constraining the visual resolution implant recipients are likely to perceive. Psychophysical studies of simulated pixelized vision on sighted patients suggest that non-trivial image processing may be
∗ Corresponding author at: Graduate School of Biomedical Engineering, University of New South Wales, Sydney 2052, Australia. Tel.: +61 2 9385 3922; fax: +61 2 9663 2108. E-mail address:
[email protected] (N.H. Lovell). 0169-2607/$ – see front matter. Crown Copyright © 2009 Published by Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2009.03.009
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
Fig. 1 – Components of a vision prosthesis. The image processor encodes images from the camera into stimulus commands and transmits them to the retinal implant via a wireless RF link. The remote supervising tool allows clinicians and researchers to configure the image processor using a separate computer.
required in order to maximally utilize the limited number of available electrodes [12–14]. Additionally, the image processor hardware needs to be portable yet able to perform possibly complex image-processing algorithms in real-time. The software on the device needs to be optimized for efficiency in power consumption and execution speed. Image acquisition and processing parameters should be easily configured to adapt to individual requirements of the patients. It has also been suggested that devices to be used in preliminary human trials should be equipped with a number of image-processing strategies in order to validate the efficacy of each [15]. Therefore it is prudent to have a scalable software architecture to support changes as the need arises. Despite the aforementioned design considerations, only a few studies have hitherto addressed the issues. Buffoni et al. [15] reviewed psychophysical findings of pixelized vision, discussed the key constraints of image processing for vision prostheses and described a number of possible image-processing strategies. More recently, as part of their implementation of a sub-retinal vision prosthesis, Asher et al. [16] have presented a set of software algorithms for tracking, cropping, geometrically transforming, and filtering images from a camera into stimulation commands. There are several advantages associated with stand-alone wearable image processors. Powerful hardware may be essential for executing the algorithms involved. This in turn leads to more complex circuitry, higher power consumption, bigger physical dimensions and higher heat dissipation, all of which are at odds with the requirements of an implantable device. By delegating image-processing tasks to an external processor the implant is freed from these additional complications. Furthermore, given that the implant will most certainly be a custom-designed application specific integrated circuit (ASIC),
259
whereas the image processor can be built from commercially available components, many of which are already optimized for performance, power efficiency and size, savings in both hardware and software development time and effort will be significant. An external image processor is also more amendable to upgrades. In this paper we present a wearable image processor based on a Texas Instruments OMAP processor running a customized version of Linux. A set of purpose-built software libraries and programs operate on the device to acquire images from the camera in real-time, perform image processing, and produce outputs, which can then be sent to a transmitter for subsequent delivery to the implant via an RF link. The software components have been carefully designed to provide real-time performance. The image processor is a stand-alone device capable of independent operation while being worn by a patient. It supports Bluetooth technology so that subjects can control various image-processing parameters using other Bluetooth-enabled devices. A complete suite of remote configuration tools have also been incorporated into the system, allowing researchers and clinicians to control the image processor as well as to experiment with various settings without physically accessing the device. Our work also indicates that depending on the image-processing algorithms involved, the hardware platform used needs to be selected carefully in order to attain real-time performance.
2.
The image processor
2.1.
Hardware
The hardware was manufactured by Spectrum Digital Incorporated (Stafford, TX, USA) and subsequently modified in our laboratory for further functionality. The device components relevant to the current discussion are summarized in Fig. 2. At the heart of the image processor is an OMAP5912 microprocessor (Texas Instruments, Dallas, TX, USA), which is primarily designed for embedded multimedia systems such as personal digital assistants (PDAs), or portable medical devices. The OMAP5912 has a dual-core architecture comprised of an ARM9TDMI general-purpose reduced instruction set microprocessor designed by ARM Limited (Cambridge, UK) and a
Fig. 2 – System hardware of the image processor. Peripherals are connected through USB. The image processor can communicate with other devices via Ethernet, RS232 and Bluetooth. Furthermore, flash memory and SDRAM are available for data storage.
260
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
latter runs on a remote computer (e.g. a laptop), while the former two programs run on the image processor itself. The two cores on the OMAP have different instruction set architectures and hence separate binary executables are required to implement the on-board image-processing software. The ARM program is the master element of the system in that it performs the following tasks: 1. 2. 3. 4. 5.
Fig. 3 – Photograph of the image processor with only the camera attached.
TMS320C55x digital signal processor (DSP, Texas Instruments). For simplicity, we will refer to the ARM9TDMI core on an OMAP5912 microprocessor as the ARM core. During normal operation the ARM core runs in host mode. It interfaces with most hardware components, executes the operating system, and delegates mathematically intensive operations to the DSP. The DSP has a separate clock, so it operates independently of the ARM core, allowing for parallel processing. The image processor has SDRAM and flash memory for data storage. An Ethernet port allows the system to be connected to a network. Peripherals may be connected to the device using RS232 serial or USB. While the OMAP5912 has dedicated hardware pins for attaching a CCD camera, we have opted for a common PC webcam through the USB bus instead, as CCD cameras with the appropriate wiring format are unavailable. The Bluetooth transceiver was also connected to the system via USB. At present, subjects can control various features of the image processor (e.g. zoom and smoothing) using a Bluetoothenabled cell phone. A photo of the device is shown in Fig. 3.
2.2.
Operating system
A version of the Linux kernel (the core of an operating system) for the ARM architecture has been under active development by the open source community for a number of years. The image processor uses a recent version (2.6.18) of the kernel in order to support specialized features and hardware. The source code was patched for the OMAP processor and parts rewritten to fix hardware driver issues. We used the “DSP Gateway” device driver [17] to facilitate communication between the two microprocessor cores. Some code modifications were necessary to make it compatible with the version of Linux kernel used. The root file system containing all software libraries, utilities, and applications was specially prepared such that it only required 6.1 MB of storage space.
2.3.
Software architecture
The system consists of three major components: the DSP program, the ARM program, and the remote supervising tool. The
Initializes hardware/software resources on start up. Acquires image frames from the camera continuously. Delegates image-processing tasks to the DSP program. Transfers stimulus commands to the output interface(s). Handles network communication requests from the Bluetooth interface. 6. Listens and responds to commands from the connected remote supervising tool. 7. Releases resources on shutdown. The ARM program runs continuously unless terminated by the user. The purpose of the DSP program is to process camera images when instructed by the ARM program. The ARM program can also reconfigure the processing parameters as required. The main purpose of the remote supervising tool is to control the image-processing parameters of the DSP program, including: • The application of image manipulation routines (e.g. smoothing) on the camera output. • Selection of sampling technique for computing electrode stimulus value. • Panning the camera over both X and Y axes. • Sampling field size adjustment (zooming). • Fine-tuning of electrode stimulus intensity. In addition, the remote supervisor can display both raw camera images and processed results from the imageprocessing device in either real-time or off-line mode. System statistics are also displayed to aid development and diagnostics. It should be noted that the supervisor does not need to be in operation for the remainder of the system to function. In fact, it can be connected and disconnected at any time. The ARM program on the image processor handles such transitions without affecting the rest of the system. The DSP and ARM program can be divided into several modules as illustrated in Fig. 4. The Main module conglomerates functionalities of all other system components. The Bluetooth and Camera modules are for accessing Bluetooth and Camera hardware, respectively, while the Linux kernel provides a handful of system calls for accessing hardware. These modules present a more convenient programming interface. Buffering is also performed to minimize delays associated with hardware access. The DSP module facilitates communication between the DSP program and the other ARM modules. The Supervisor module services commands from the remote supervisor. This is assisted by Net Operations, which performs all the reading and writing functions over a TCP/IP network. The Output
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
261
Fig. 4 – Components of the image processor software. The DSP and ARM programs run on the DSP and ARM cores of the OMAP processor, respectively. The former, being responsible for image-processing tasks, is relatively small, consisting of only two modules. The latter handles the remaining tasks, including DSP control and interfacing with external entities.
Interfaces module is responsible for sending processed results from the DSP to one or more system outputs, such as a RF link transmitter. The Frame Processor is the main module of the DSP program, performing all image-processing operations. The Memory Allocator is a purpose built memory manager for the DSP.
2.4.
Image processing
Image processing for vision prostheses requires a number of fundamental procedures, as outlined in Fig. 5. The first step involves acquiring an image from the camera. The second step performs several enhancement algorithms on the image. Because the image would most certainly have far more pixels than electrodes available on an implant, appropriate down sampling and pixelization is necessary. In the final step the pixelized image is converted into stimulation commands for transmission to the prosthesis in the patient’s eye. We have been developing stimulation arrays whereby the electrodes are arranged as a hexagonal lattice [5]. This is demonstrated in the center window shown in Fig. 7. While our stimulating electrode arrays have fixed positioning relative to each other, clinicians can configure the image processor to sample points away from the default locations. This is achieved by assigning new sampling coordinates using the “sampling point re-mapping” utility in the remote supervising tool. The DSP program will then produce appropriate outputs by calculating the center of each sampling point using the offsets supplied by the user. Subjects can also zoom in or out (non-optical zoom performed in software), and pan the field of sampling points horizontally and vertically. Whenever a subject changes the sampling parameters, the Supervisor module of the ARM pro-
gram reconfigures the DSP program settings appropriately such that changes are reflected on the next image processed. If changing sampling points were all that the subject could do then an efficient and simple implementation would be easily achieved by storing the points in a lookup table. However, because users can dynamically zoom and pan the sampling field, the centers of each point are calculated dynamically. The camera produces images by impulsively sampling values arranged in a two-dimensional array. Zooming will likely cause the sampling points to fall somewhere between the discrete points of the image array. Bilinear interpolation [18] is used to approximate the image value in such cases. Essentially, it approximates the value by taking a distance-weighted average of the quasi-point’s four nearest real neighbors, whose value we know (from the image). For instance, suppose a point P = (x, y) is the sampling point and the value of its four nearest neighbors are Q1 , Q2 , Q3 and Q4 . Then P’s value is given by: Q1 × (1 − xr) × (1 − yr) + Q2 × xr × (1 − yr) + Q3 × (1 − xr) × yr +Q4 × xr × yr where xr = x − | x | yr = y − | y | All coordinate calculations in the DSP program are carried out in floating point representation to minimize rounding errors. If desired, image smoothing can be applied. Two smoothing algorithms are available. When 3 × 3 neighbored averaging [18] is used, the returned value is simply the average of the sampling point and its surrounding eight neighbors. Although less
Fig. 5 – Summary of the image-processing procedure. Images acquired by a camera are enhanced, pixelized, then converted into stimulation commands for transmission to the implanted device.
262
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
Table 1 – Performance (in frames per second) of smoothing schemes on various processing/computation platforms. The “Impulse” column represents the baseline value without smoothing. The “3 × 3” column and “Gaussian” column are values derived when performing 3 × 3 neighbor averaging and Gaussian-weighted averaging, respectively. Platform OMAP 5912 (ARM9 only) OMAP 5912 (ARM9 + C55x) PC (camera) PC (software)
Impulse
3×3
Gaussian
24 30 30 54
3 21 30 51
0.5 9 30 48
than optimal, it is fast to compute. Gaussian-weighted averaging [18] returns a value based on the weighted average of the center point and its surrounding points. The weighting is based on the profile of a two-dimensional Gaussian surface, which is approximated by a 5 × 5 convolution kernel in the spatial domain. The second algorithm is slower than the first due to the almost threefold increase in the number of processing points (see Table 1), but the resulting output is more representative of the original image by taking into account more information from it. After sampling and smoothing, the value for each electrode can be further modulated by a scaling factor. The last operation that can optionally be performed is inverting the value of each sampling point to create a negative effect. In summary Fig. 6 illustrates the procedures executed by the DSP program for each frame of camera image.
3.
Results
3.1.
Image processing and stimulus generation
Fig. 7 is a screen capture of the remote supervising tool running on a Linux desktop PC. Arranged down the right hand side are various controls for sending commands to the image processor. The child window towards the top left (partially covered) displays the raw images from the camera as well as the stimulus output for each electrode rendered graphically. The dialog in the center is the “sampling point remapping” utility described previously. The zooming functionality is demonstrated in Fig. 8. For comparison, Fig. 8A is the view as seen by the camera. Fig. 8B is the default output without any zoom. Fig. 8C and D used a zoom factor of 1.6 and 2.6 times, respectively. Smoothing was disabled in all three instances. The side profile of the monitor is barely visible in the default view due to aliasing. It becomes progressively clearer as the zoom factor increases. Fig. 9 illustrates the effect of Gaussian smoothing. Fig. 9A is the view seen by the camera. The image was taken in a dark room with the desk lamp being the only source of light. Smoothing was disabled in Fig. 9B but enabled in Fig. 9C. The softening effect is clearly visible. Also available in the image processor is smoothing with 3 × 3 neighborhood averaging (not shown). An attempt at using the image processor for locating and reading keys on a black keyboard with white prints is shown
Fig. 6 – Image-processing steps of the DSP program. The program begins by determining the center sampling point for each electrode. Fine-tuning of the stimulus intensity of each electrode, and selection of image smoothing methods are possible. In addition, the final output can be inverted to produce a negative effect not unlike those in photography.
in Fig. 10. In Fig. 10A the camera was fixated on the ‘B’ key. Without any zoom the white letter appeared as a dot in one of the phosphene pixels in Fig. 10B. The camera was then zoomed in 5.3 times in Fig. 10C, Where the outline of the character is clearly visible. The negative effect is demonstrated in Fig. 10D. In some situations it enhances image contrast. Smoothing was not used in any of these cases.
3.2.
System performance
In our implementation the smoothing algorithms have the greatest impact on the overall system performance. For this reason we have benchmarked the speed of the system with: smoothing off, 3 × 3 neighborhood averaging, and Gaussianweighted averaging. The unit of measure is frames per second (FPS). Table 1 summarizes the results on different hardware platforms. An identical camera was used in all instances to provide an objective comparison. The camera is only able to acquire 30 frames per second at 176 × 144 pixels resolution. Thus the maximum FPS achievable by the image processor is limited to 30 as well. In all cases the ARM9 microprocessor and the C55x DSP were both clocked at 192 MHz.
263
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
Fig. 7 – The remote supervising tool. Researchers and clinicians have full control of the image processor through this tool. It runs on a standard PC with an Ethernet connection.
The first row shows the result of using only the ARM9TDMI microprocessor on the OMAP. The values were obtained by re-implementing the entire DSP program within the ARM program. The second row represents the default configuration, namely, the image processor setup as described in this paper. For comparison both the ARM and DSP programs were ported for use on a x86 machine. The FPS for a Pentium 4 2.13 GHz desktop machine with 1 GB of RAM is shown in the last two rows of the table. The third row shows the speed when images were acquired with the camera used on the image processor. The FPS was limited by the throughput of the camera (30 frames). To obtain the absolute speed of the software when running on a desktop PC, we arranged for pre-recorded images to be fed into the program. Without the camera speed bottleneck the program was able to deliver a FPS of approximately 50, as listed in the last row.
avoid transient fluctuations the measurements were recorded and averaged over 5 min of execution. In all cases the Ethernet cable was plugged in with the remote supervisor connected, and the image processor was also connected to the development PC via the RS232 serial port. The “Idle” condition is defined as when the device has booted up and is standing by to begin image processing. Under this condition, the CPU cycles and memory requirements are primarily dictated by the operating system’s needs.
Table 2 – Power consumption of the image processor under different conditions. The current values were measured at the power input socket of the device and averaged over 5 min. The second and third column indicates whether the Camera and Bluetooth module were attached to the device, respectively. Condition
3.3.
Power consumption
Table 2 provides power consumption measurements under various conditions. The values are total current draw of the image processor as measured at the 5 V power input socket. To
Idle Idle Idle Processing
Camera No Yes Yes Yes
Bluetooth No No Yes Yes
Current (mA) 60 340 360 425
264
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
Fig. 8 – Zooming with the image processor. The effect of zooming at 1× (B), 1.6× (C) and 2.6× (D) is demonstrated. The original image (A) is shown at top left for comparison. Notice that the left portion of the monitor becomes more distinguishable as zoom factor increases.
It is noteworthy to highlight the amount of power the USB webcam used. As indicated by the first and second row of Table 2, when connected it consumed several times the combined power requirement of all remaining electronics on the device. Under the “Processing” condition (third row), the image processor was converting camera images into stimulus commands and performing Gaussian smoothing to simulate real-life scenarios. With the current hardware configuration and a battery similar to those available for laptops (e.g. a 5110 mAh lithium ion battery), the image processor can be expected to operate continuously for over 12 h.
4.
Discussion
4.1.
Choice of platform and development tools
Previous quantitative performance measurements of imageprocessing algorithms pertaining to vision prosthesis suggest that DSPs may be essential for a successful image processor implementation [19]. While they excel at mathematically intensive computations, DSPs generally have minimal support for peripheral hardware and are certainly not designed to run general-purpose operating systems. For these two reasons, the Texas Instruments OMAP microprocessors are ideal off-the-shelf candidates. Within one chip, the DSP core can perform all numerical computations while the ARM core handles all other tasks such as communication and peripheral
device control. While many high-performance ultra-portable PCs currently available will undoubtedly have enough processing power, as suggested by Table 1, they would be far more cumbersome to carry around, and are certainly by no means wearable. A number of kernels exist for use in embedded devices, ranging from the large monolithic types like Linux, to the small application specific microkernels. Our choice to use Linux was prompted primarily by three reasons: 1. By building on the existing code-base available for Linux, the device can support a large feature set from the inception. This is an advantage for devices that are undergoing constant research and development, and where the clinical requirements are as yet not thoroughly understood. As the device matures over time, unnecessary features can then be excluded. 2. The Linux kernel has extensive support for the ARM architecture. 3. Drivers for a large number of peripheral devices currently exist for Linux. With the Linux operating system as the basis, a variety of programming languages may be used to implement the ARM program. The C language was chosen for a balance between system performance and development effort. Due to the minimalist nature of the device, both hardware- and
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
265
Fig. 9 – Gaussian smoothing. The original image is shown in (A). Outputs without and with Gaussian smoothing are shown in (B) and (C). Averaging gives the output a soft appearance.
software-wise, it was difficult to program from within the device itself. Instead, the GNU GCC cross-compilation tool chain for the ARM platform was used to compile the source code on a PC-based development machine. The resulting executable binaries were then transferred onto the device. All the system-level libraries and utilities required for basic functions of the image processor were also produced in the same way. A number of time-critical sections of the DSP program were hand optimized using assembly code while the remaining majority was written in C. In both cases the C55x C compiler and assembler provided by Texas Instruments were used to generate the binary. The remote supervisor was written in C++ with Qt version 4 (Qt Software, Oslo, Norway), a graphical user interface toolkit, primarily for its extensive support of cross-platform compatibility. The authors developed and used the remote supervisor on Linux as a matter of convenience. It can however be ported to other popular operating systems with ease.
4.2.
Implementation considerations
Each raw image from the camera is of the order of 50 kB in size, packed into a contiguous stretch of 8-bit memory blocks. This is problematic for a number of reasons. The smallest data type supported by the ARM architecture and the TMS320C55x architecture is 8 and 16 bits, respectively. Thus either the ARM program should rewrite all images into 16-bit blocks before
sending them to the DSP, or the DSP program should use bit masking in software whenever reading 8-bit blocks from the image. We have chosen the former technique. Rewriting images into 16-bit blocks results in twice the amount of memory requirements and the most significant 8 bits of each 16-bit block are always set to 0. However, this adds a constant – hence predictable – amount of run time per image frame processed, a particularly important consideration for real-time systems such as the image processor. Furthermore, the DSP program implements a series of image-processing filters, each of which requires numerous read operations into the image. Performing a bit masking operation for every read would add significant overhead and hence degrade performance. The DSP has 32k word of dual-access RAM and 48k word of single-access RAM, each word being 16 bits in length. Image rewriting as described above doubles the memory requirement. Clearly complete images (approximately 100 kB after rewrite) cannot fit into the DSP’s RAM. It is also preferable to keep as much of the DSP’s RAM free as possible, because many image-processing algorithms require memory for storing intermediate results during computation. To resolve the issue we used a feature available on the OMAP whereby the SDRAM normally only used by the ARM core is mapped into the virtual address space of the DSP. In other words, by sharing some of the system RAM used by the ARM processor, one can extend the amount of memory available to the DSP. In theory, up to 64 MB of SDRAM can be mapped into the DSP. But for rea-
266
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
Fig. 10 – Reading a keyboard character. The processing parameters can be tuned to enhance one’s ability to read fine prints, demonstrated here with keyboard labels. (A) The scenery as seen by the camera. (B) The fixated keyboard character is not recognizable without any zoom. (C) Zooming at 5.3×, the letter is easily readable. (D) At the same zoom level, but with negative effect turned on.
sons related to the C compiler supplied by Texas Instruments and the Memory Management Unit on the DSP, only 128 kB can be shared in the current setup. Nevertheless, this is more than adequate. Sharing memory also provides an additional benefit. Namely, it is no longer necessary to copy data from the SDRAM into DSP’s RAM for each image to be processed, thus improving overall efficiency of the program. As mentioned previously, the ARM and DSP core have separate clocks and run concurrently. The two programs were arranged to coordinate access to the shared memory to ensure data consistency. Access to the shared memory is arbitrated by a signaling mechanism. It would be wasteful to allow the ARM program to sit idle while the DSP is working. Therefore the ARM program is arranged to perform image acquisition I/O during this time. It is a relatively lengthy process taking up to several tens of milliseconds per frame. Furthermore, the ARM program buffers two frames ahead so that occasional delays in hardware I/O would not unnecessarily stall the rest of the system. To prevent hardware I/O delays from the Bluetooth module slowing down the entire system, it is implemented as a separate thread. On multi-CPU systems threads allow several parts of a program to be executed concurrently by different CPUs, thus increasing throughput. On the image processor there is only one ARM core. However, benefit is derived from allowing the kernel to suspend execution of the Bluetooth module
whenever it blocks waiting for hardware I/O, and executes another thread (for example, the main module) suspended previously. Consequently, delays caused by Bluetooth do not affect the remainder of the program. The DSP program uses the Memory Allocator module (see Fig. 4) to manage the SDRAM it shares with the ARM program. This is necessary because the memory allocation functions provided by the C standard library could only allocate from the DSP’s own internal RAM. A custom implementation can also exploit memory requirement characteristics of the DSP program to provide a level of efficiency that would not have been possible with a general-purpose implementation. The current memory allocator operates by returning the first stretch of free memory it can find. No attempt is made to minimize memory fragmentation and a compaction algorithm is never used. For most situations this would quickly lead to small “gaps” in the free memory pool as numerous allocations and deallocations are made for memory of varying sizes. Without further treatment it would eventually fail prematurely. However, the current implementation works for the image processor and is extremely fast. This is because it is known a priori that for every frame of image processed, three or less memory allocation requests will be made, with each being for a relatively large block of memory in the order of tens of kilobytes. Moreover, when the program completes a frame, the state is reset. The next frame will start the process afresh.
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
267
Therefore the DSP program will never run to the stage where repeated allocations and deallocations may start to cause problems.
4.3.
Image processing
Electrical stimulation of the retina will elicit activity in multiple ganglion cells and possibly other cell types in layers of the retina [20,21], which in turn may lead to the activation of ganglion cells at sites distant to the intended position. Consequently the position of any perceived phosphene(s) may be displaced from the intended location. The “sampling point remapping” tool corrects any such discrepancies. As the image processor acquires images from the camera and turns them into electrical stimuli, it samples each image at locations that correspond to the arrangement of the stimulation electrode array by default. If it is subsequently found that the subject’s perceived phosphene position does not match the physical position of the light stimuli (as captured by the camera), then the sampling point can be moved with the re-mapping tool. An example is demonstrated in Fig. 11. Bilinear interpolation was chosen because it is reasonably inexpensive to calculate. Depending on the level of scaling, images up-scaled (zoomed in) using bilinear interpolation suffer from some blurring, jagged edges (aliasing) and sometimes halo effects around edges. When used for down-scaling (zooming out) moiré patterns tend to occur (another form of aliasing). It has been proposed and is hoped that the number of electrodes will increase substantially in the not too distant future. As such, the scaling implementation on the current image processor may no longer be ideal with respect to the resolution offered by new devices. A number of alternative algorithms are known to produce images with better quality, but they tend to incur more resources than bilinear interpolation. One can also reduce interpolation artifacts by increasing the resolution of the camera output at the expense of higher memory consumption, which increases roughly by the square of the image dimensions. However given the rather limited number of electrodes available on current vision prostheses, the computational overhead is not warranted. The reason for having scaling of individual sampling points is motivated by concerns regarding variations at the electrode–tissue interface after implantation. Electrodes with non-ideal positioning may have higher impedance, hence requiring stronger intensity levels to evoke the same response. Sometimes an electrode may fail to elicit an action potential regardless of stimulus intensity. In such situations the electrode can be disabled all together by setting the scaling factor to 0. It has been suggested that user-controllable zooming will help to increase the efficacy of vision prostheses, especially when only a limited number of electrodes are available [22]. This is not surprising, since zooming allows sampling density to be matched to the dimensions of the visual field of interest such that aliasing is reduced to a level where recognition becomes possible. Psychophysical studies by Fu et al. [23] involving reading printed texts support this assertion. Here it can be seen in Fig. 10 that readability of a keyboard character is enhanced by filling the electrode array with the character of interest at a zoom factor of 5.3×.
Fig. 11 – Correction of positional discrepancy between a camera image sampling point and a phosphene position. Coordinates for the Image Space and Electrode Space are written in capital and small letters, respectively. (A) In the ideal case, the subject sees the phosphene at the location where the light stimulus appears. (B) It is possible that the stimulating electrodes may activate cells at slightly different location ((x3 , y3 ) instead of (x1 , y1 )), causing the wrong phosphene to be generated for that location. (C) The re-mapping tool adds an offset (I, J) to the default sampling location so as to generate the correct phosphene.
Buffoni et al. proposed that maximum control should be given to the patient [15]. We fulfill this requirement by letting patients control the image processor using a Bluetooth-enabled cell phone, which in essence functions as an omni-directional remote control. The limited number of buttons with simple arrangement allows patients to issue commands by “feeling” for the right key using their fingertips. For example, the joystick (four directions of movement) controls the zoom level and camera panning, while the numeric keypad toggles various image-processing effects. This said, a purpose built remote control could also be constructed and interfaced with the image processor using Bluetooth.
4.4.
System performance
Comparing the first two rows of Table 1, it is clear that the TMS320C55x DSP improves performance. As expected, the more mathematical operations performed, the greater the gain. ARM9TDMI plus TMS320C55x performs 19 times
268
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
better than with ARM9TDMI alone when Gaussian-weighted smoothing is used. Comparing with some of the imageprocessing techniques proposed for vision prostheses, the real-time algorithms used in the present device are demanding for embedded systems but by no means complex. Therefore DSPs will be of great benefit, if not crucial, for most implementations. Our algorithms are able to deliver approximately 50 frames per second on a desktop PC. Incidentally, Asher and colleagues [16] had similar throughput on a PC, even though the algorithms are very different. Their image pre-processing algorithms were developed in Matlab while the time-critical components were written in C. A throughput of 50 frames per second or higher was achieved using a Pentium 2.6 GHz PC. Although untested, they believe their implementation is capable of providing 25 frames per second or more when ported to a pocket PC. When demanding processing is performed, it is of significance to note that our apparently fast program (on a PC) in fact performs sub-optimally when executed on an embedded device. Therefore one should exercise caution when using desktop PCs to estimate performance of programs intended for use on embedded systems. Powered by a Texas Instruments OMAP microprocessor the wearable image processor achieved satisfactory performance under most configurations. When image smoothing is performed by way of convolution with a 5 × 5 Gaussian kernel, the throughput dropped to approximately 10 frames per second. While still usable, it is no longer ideal. The reduction in performance is due to dynamic zooming. With a 5 × 5 kernel, the DSP needs to compute 25 bilinear interpolations in order to approximate the stimulus value for an electrode. In contrast, impulse sampling only needs one. However, the independent nature of these operations means that they can be performed in parallel. Additionally, in the current implementation the DSP program never modifies the raw camera image acquired. These two characteristics imply that if the image processor were to employ multiple DSPs then significant performance improvements – approaching 30 frames per second as limited by the camera CCD circuitry – can be expected. An alternative to the multi-DSP scheme is to use a more powerful DSP, such as processors belonging to the Texas Instruments TMS320C6x family. Although much more capable, these DSPs also consume prohibitive amounts of power. A TMS320C6711 for example requires as much as 1.1 watt, which can be anywhere from 7 to 17 times more than a TMS320C55x performing the equivalent algorithm [19]. If an image processor were to be built with such a DSP, the battery requirement and the weight would make them rather inconvenient to use. A consequence of low frame rate is that patients will receive intermittent stimuli delivery. A possible remedy involves delivering identical stimuli multiple times while the image processor is computing the next frame. It should be noted, however, that while this technique ensures constant visual perception, the patient would still experience the latency due to processing delays.
4.5.
rigorously followed during the development of the current wearable image processor. These however will be essential for any device intended for clinical trial on vision-impaired patients. While the use of open source software (Linux and associated device drivers) has many advantages as described previously, it poses special regulatory clearance considerations. Regulatory agencies such as the FDA generally decree that it is the device manufacturer’s responsibility to ensure the safety of all components, including software, acquired from a third-party [24]. Linux is a system designed primarily for general-purpose computing, where consequences of failure are likely to be more tolerable, the level of correctness verification by the original developers may not suffice for medical devices. Additionally, unused kernel features increase complexity of the system, thus the likelihood of faults. While Linux is open for white box testing without restrictions, the process can be cumbersome due to the large code base. In recent years a number of open source kernels designed specifically for mission- and life-critical embedded systems have appeared. One of which, the seL4 [25], is particularly promising for devices such as the present image processor. Its support for the ARM architecture is already in place, has a small code base, and most important of all, a project is currently in progress to mathematically verify the correctness of the entire kernel [26]. An increasing number of vision prosthesis prototypes are progressing towards human trials. Considerations like these will become even more pertinent for those involved in the design and development process.
5.
Conclusion
We have demonstrated the implementation of an image processor for use with vision prostheses, with particular emphasis on the approach taken to maximize software performance. The results indicate that a highly flexible and configurable device can be built using an OMAP microprocessor with dual-core architecture. However, depending on the image-processing algorithms involved, more powerful signal processing hardware than those available on the OMAP may be required to deliver real-time performance. Furthermore, general-purpose embedded microprocessors alone (for example, ARM) are unlikely to be adequate for implementing image-processing strategies required by vision prostheses. Our benchmark comparisons also indicate that care should be taken when using desktop PCs to judge the performance of algorithms intended for use on embedded devices, which are constrained by processing capability, memory availability, power consumption, size, and weight. Little work on image-processing devices for vision prostheses has been reported to date. Given the importance of image processing and contemporaneously the complex nature of some proposed techniques, design and implementation of such devices is a non-trivial task requiring careful considerations between several conflicting requirements. More effort should be devoted to this important topic.
Regulatory considerations
Conflict of interest statement Besides general good software engineering techniques, no particular software process life cycle and associated steps were
None declared.
c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 9 5 ( 2 0 0 9 ) 258–269
Acknowledgements We thank Philip Byrnes-Preston for expert technical assistance. This research was supported by funding from the Australian Federal Government (Department of Education, Science, and Training and the Australian Research Council) and Retina Australia.
references
[1] M.S. Humayun, J.D. Weiland, G.Y. Fujii, R. Greenberg, R. Williamson, J. Little, B. Mech, V. Cimmarusti, G.V. Boemel, G. Dagnelie, E. de Juan Jr., Visual perception in a blind subject with a chronic microelectronic retinal prosthesis, Vision Research 43 (2003) 2573–2581. [2] W. Liu, M. Sivaprakasam, P.R. Singh, R. Bashirullah, G. Wang, Electronic visual prosthesis, Artificial Organs 27 (2003) 986–995. [3] Y. Pan, T. Tokuda, A. Uehara, K. Kagawa, M. Nunoshita, J. Ohta, Flexible and extendible neural stimulation device with distributed multichip architecture for retinal prosthesis, Japanese Journal of Applied Physics 44 (2005) 2099–2103. [4] Y.T. Wong, N. Dommel, P. Preston, L.E. Hallum, T. Lehmann, N.H. Lovell, G.J. Suanning, Retinal neurostimulator for a multifocal vision prosthesis, IEEE Transactions on Neural Systems and Rehabilitation Engineering 15 (2007) 425–434. [5] N.H. Lovell, L.E. Hallum, S. Chen, S. Dokos, P. Byrnes-Preston, R. Green, L. Poole-Warren, T. Lehmann, G.J. Suaning, Advances in retinal neuroprosthetics, in: M. Akay (Ed.), Handbook of Neural Engineering, Wiley/IEEE Press, Hoboken, NJ, USA, 2007 (Chapter 20). [6] K. Cha, K. Horch, R.A. Normann, Mobility performance with a pixelized vision system, Vision Research 32 (1992) 1367–1373. [7] J.S. Hayes, V.T. Yin, D. Piyathaisere, J.D. Weiland, M.S. Humayun, G. Dagnelie, Visually guided performance of simple tasks using simulated prosthetic vision, Artificial Organs 27 (2003) 1016–1028. [8] R.W. Thompson Jr., G.D. Barnett, M.S. Humayun, G. Dagnelie, Facial recognition using simulated prosthetic pixelized vision, Investigative Ophthalmology and Visual Science 44 (2003) 5035–5042. [9] X. Chai, W. Yu, J. Wang, Y. Zhao, C. Cai, Q. Ren, Recognition of pixelized Chinese characters using simulated prosthetic vision, Artificial Organs 31 (2007) 175–182. [10] J. Sommerhalder, E. Oueghlani, M. Bagnoud, U. Leonards, A.B. Safran, M. Pelizzone, Simulation of artificial vision. I. Eccentric reading of full-page text and the learning of this task, Vision Research 43 (2003) 269–283.
269
[11] J. Sommerhalder, B. Rappaz, R. de Haller, A.P. Fornos, A.B. Safran, M. Pelizzone, Simulation of artificial vision. II. Eccentric reading of full-page text and the learning of this task, Vision Research 44 (2003) 1693–1703. [12] J.R. Boyle, A.J. Maeder, W.W. Boles, Challenges in digital imaging for artificial human vision, in: Proceedings of the SPIE, vol. 4299, 2001, pp. 533–543. [13] S. Chen, L.E. Hallum, N.H. Lovell, G.J. Suaning, Visual acuity measurement of prosthetic vision: a virtual–reality simulation study, Journal of Neural Engineering 2 (2005) 135–145. [14] L.E. Hallum, G.J. Suaning, D.S. Taubman, N.H. Lovell, Simulated prosthetic visual fixation, saccade, and smooth pursuit, Vision Research 45 (2005) 775–788. [15] L. Buffoni, J. Coulombe, M. Sawan, Image processing strategies dedicated to visual cortical stimulators: a survey, Artificial Organs 29 (2005) 658–664. [16] A. Asher, W.A. Segal, S.A. Baccus, L.P. Yaroslavsky, D.V. Palanker, Image processing for a high-resolution optoelectronic retinal prosthesis, IEEE Transactions on Biomedical Engineering 54 (2007) 993–1004. [17] T. Kobayashi, Linux DSP Gateway Specification Rev 3.3, Nokia Corporation, 2005. [18] R.C. Gonzalez, R.E. Woods, Digital Image Processing, 2nd ed., Prentice Hall, New Jersey, USA, 2002. [19] N.J. Parikh, J.D. Weiland, M.S. Humayun, S.S. Shah, G.S. Mohile, DSP based image processing for retinal prosthesis, in: Proceedings of the 26th Annual International Conference of the IEEE EMBS, vol. 2, 2004, pp. 1475–1478. [20] R.J. Jensen, J.F. Rizzo III, Thresholds for activation of rabbit retinal ganglion cells with a subretinal electrode, Experimental Eye Research 83 (2006) 367–373. [21] R.J. Jensen, J.F. Rizzo III, Responses of ganglion cells to repetitive electrical stimulation of the retina, Journal of Neural Engineering 4 (2007) 1–6. [22] L.E. Hallum, G. Dagnelie, G.J. Suaning, N.H. Lovell, Simulating auditory and visual sensorineural prostheses: a comparative review, Journal of Neural Engineering 4 (2007) 58–71. [23] L. Fu, S. Cai, H. Zhang, G. Hu, X. Zhang, Psychophysics of reading with a limited number of pixels: towards the rehabilitation of reading ability with visual prosthesis, Vision Research 46 (2006) 1292–1301. [24] Office of Device Evaluation, Center for Devices and Radiological Health, Food and Drug Administration, General Principles of Software Validation; Final Guidance for Industry and FDA Staff, 2002. [25] G. Heiser, K. Elphinstone, I. Kuz, G. Klein, S.M. Petters, Towards trustworthy computing systems: taking microkernels to the next level, ACM Operating Systems Review 41 (2007) 3–11. [26] K. Elphinstone, G. Klein, P. Derrin, T. Roscoe, G. Heiser, Towards a practical, verified kernel, in: Proceedings of the 11th Workshop on Hot Topics in Operating Systems, San Diego, CA, USA, 2007.