Tracking zebrafish larvae in group – Status and perspectives

Tracking zebrafish larvae in group – Status and perspectives

Methods 62 (2013) 292–303 Contents lists available at SciVerse ScienceDirect Methods journal homepage: www.elsevier.com/locate/ymeth Tracking zebra...

1MB Sizes 5 Downloads 33 Views

Methods 62 (2013) 292–303

Contents lists available at SciVerse ScienceDirect

Methods journal homepage: www.elsevier.com/locate/ymeth

Tracking zebrafish larvae in group – Status and perspectives q Pierre R. Martineau a,b,⇑, Philippe Mourrain a,c,⇑ a

Department of Psychiatry and Behavioral Sciences, Center for Sleep Sciences, Beckman Center, Stanford University, Palo Alto, CA 94305, USA Martineau & Associates, Menlo Park, CA 94025, USA c Inserm 1024, Ecole Normale Supérieure, 75005, France b

a r t i c l e

i n f o

Article history: Available online 24 May 2013 Keywords: Zebrafish Tracking Motion Representations Sampling Processing Articulated and skeleton models Frame-to-frame association

a b s t r a c t Video processing is increasingly becoming a standard procedure in zebrafish behavior investigations as it enables higher research throughput and new or better measures. This trend, fostered by the ever increasing performance-to-price ratio of the required recording and processing equipment, should be expected to continue in the foreseeable future, with video-processing based methods permeating more and more experiments and, as a result, expanding the very role of behavioral studies in zebrafish research. To assess whether the routine video tracking of zebrafish larvae directly in the Petri dish is a capability that can be expected in the near future, the key processing concepts are discussed and illustrated on published zebrafish studies when available or other animals when not. Ó 2013 The Authors. Published by Elsevier Inc. All rights reserved.

1. Introduction For years, zebrafish has been a promising system for behavioral sciences. Visually guided behaviors such as the optokinetic reflex, the optomotor response, escape response, and prey-capture, and corresponding retino-tectal and spinal cord circuits have been extensively investigated in zebrafish for more than a decade (reviewed in [1]). In addition to these visual and locomotor behaviors, the neurosciences community has been growing increasingly interested in the use of zebrafish for more sophisticated and complex behaviors involved in psychiatric disorders such as drug addiction and withdrawal [2–6], aggressiveness [7,8], fear and anxiety (reviewed in [9]), social interaction [10,11], learning and memory [12,13], and circadian and sleep-wake behaviors [14–17]. As a vertebrate, zebrafish also has the advantage of sharing with mammals similar central nervous system organization and circuitry underpinning behaviors. The principal neurotransmitters found in mammals are largely conserved in zebrafish, including amino acids (glutamate, GABA, glycine) [18], monoamines

(histamine, dopamine, norepinephrine, epinephrine, serotonin, melatonin) [19–21], and acetylcholine [22], consistent with the conserved responses of zebrafish to various drugs targeted to the nervous system [2,23,24]. Importantly, zebrafish harbor most of the neuropeptides present in mammalian nervous systems (e.g. [17,25]). The zebrafish represents an extremely attractive system to understand the molecular and neuronal substrates of behaviors. Zebrafish behavior assessment methods are relevant in an increasing number of experimental contexts as a result of the growing usability of video processing. Not only does video processing allow for measure automation and increased accuracy, leading to higher research throughput, it also allows the definition of entirely new measures based on features that would not be detectable or countable by manual methods. Our aim is to identify and discuss the key processing components and challenges relevant to the video-based macroscopic observation of free-swimming zebrafish at the larval stage in laboratory conditions. We define the larval stage from 3 dpf to the 4th week; the earlier embryo stage is out of scope here as it usually involves different setups and toolkits. 2. Tracking zebrafish larvae motion

q

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-No Derivative Works License, which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited. ⇑ Corresponding authors. Address: Stanford University, School of Medicine, Beckman Center, 279 Campus Drive, Room B201 CCSR Room 2245b (South Wing) Stanford, CA 94305, USA. E-mail addresses: [email protected] (P.R. Martineau), [email protected] (P. Mourrain).

Small size, body transparency, and discontinuous kinematics combine to make tracking individual larvae in a Petri dish notoriously difficult. At the time of this writing, we know of no off-theshelf system that supports tracking zebrafish larvae under such unrestricted conditions. As a result, a large majority of studies physically segregate fish, typically by placing them in individual wells of a multi-well plate.

1046-2023/$ - see front matter Ó 2013 The Authors. Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ymeth.2013.05.002

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

Tracking is an intuitive and pervasive concept. Simply counting zebrafish in a clutch requires tracking: keeping track of individual fish while the count is taken. One could safely assert that all behavioral studies, not simply video-based studies, involve some form of tracking. It is no surprise then that the term is ubiquitous in the video-based behavior assessment literature, even when restricted to zebrafish. This abundance can be illustrated by querying Google Scholar: a query with the three words zebrafish, video, and tracking returned approximately 2850 records in February 2013. Comparing studies is difficult unless we dive into the processing details and define some basic comparison or performance criteria. Regarding the latter, we will pay particular attention to the reliability of the tracking output (accuracy of kinematic measures and error estimation, integrity of the system vis–à–vis potential swap of fish identity), the theoretical basis of the tracking approach, and the cost, in terms of processing power, at which the tracking data is extracted. In video-processing, multi-target tracking – the process by which several moving objects are tracked from frame to frame – is usually characterized by three major groups of tasks performed on each frame in two consecutive steps. In a first step, object features are extracted from the image (feature extraction) and are processed in data structures that describe the various detected objects (target representation). In a second step, structures representing objects across consecutive frames are associated so that the identity of each object can be traced and various kinematic calculations can be performed. In everyday life, video is recorded in color; for describing the tracking process however, we will consider that video is recorded in black and white, producing sequences of gray-level images, as it is the case with most zebrafish studies. The few studies that use color video either process each color channel as separate gray-level images, select the best color channel in each phase of the experiment and process images from that channel only [26], or reduce the color planes into a single gray-level image via linear combination [27]. Before we examine the tracking process, we need to examine tracking’s raison-d’être: zebrafish larva motion. 3. Zebrafish larva motion 3.1. Movement Zebrafish larva motion kinematics have been studied in details [28–31]. These studies are based on the stop-motion technique which, as noted by [31], was pioneered in the mid 19th century by the photographer Eadweard Muybridge to investigate the details of animal locomotion. These studies have characterized a small set of elementary motion building blocks which can be assembled in sequences expressing more complex locomotory patterns. For example, the familiar larva escape movement (fast startle response in [29]) can be precisely defined as a stereotypical sequence initiated by a C-start and followed by a counter-bend and an episode of cyclic swimming. Table 1 shows seven easy-to-characterize elementary patterns adapted from the list of nine basic maneuvers described in [32], although one should keep in mind that such categorizations are not absolute or problem-free, particularly when analyzing mutants. 3.2. Immobility From a kinematics point of view, detecting immobility is very important because larva motion shows highly discontinuous acceleration pulses (we will discuss the impact of these acceleration

293

discontinuities when we review the mathematics justifying the various tracking approaches). The most significant of these discontinuities are found at the transition between immobility and movement. By removing all episodes of immobility, however brief, one may hope to extract a set of motion segments during each of which second order kinematics is continuous. Therefore, precise detection of immobility is crucial. Immobility is important for biological reasons as well; it is used to characterize sleep [9,15,16,33], inactivity, and freezing behavior. However, most methods, including commercial larva tracking systems, assess immobility on the basis of displacement thresholds (in [38], a notable exception, the authors perform spectral analysis of the larva’s tail curvature variations to assess immobiltiy more precisely). Using such thresholds does not allow for precise detection of immobility because immobility cannot be defined on the basis of a displacement threshold alone due to the frequent presence of zero or very low displacement amplitude movements. Examples of such movements include very low amplitude scoots (see Table 1 and Fig. 1G) and zero-displacement routine turns (Fig. 1G). Immobility episodes are very frequent themselves in larvae (found in 99.7% of all 267 ms time bins in Fig. 1F) and transitions from motion to immobility are typically very brief due to water viscosity. Active quasi immobility is another example of an easily observed, potentially significant behavior hard to detect with methods using displacement thresholds. Fig. 1A–C shows an example of a freezing behavior that could not be detected via such threshold: during the episode, the fish have frequent immobility maintenance movements that produce displacement amplitudes larger than some of the in-place prey capture movements exhibited during the highly active feeding phase. While the contrast in overall movement provides clues to recognize the freezing episode of the juvenile fish in Fig. 1B, freezing is much more difficult to characterize at the larval stage.

4. Fish representation Targets are represented computationally by sets of parameters. In point representations, these parameters are the spatial (threedimensional) or planar (two-dimensional) coordinates of a single point expressed in a Cartesian, cylindrical, or spherical system. In situations where a mere point is not sufficient to represent the target, richer representations are used; we will refer to them as structured. Structured representations can contain a substantial number of parameters for each target. For example, to maintain the position and wing posture attitude of a Drosophila, [27] maintains a set of 25 parameters per fly. Even though the zebrafish larva is smaller in volume and has a geometrically simpler body envelope, the number of parameters used to represent its body position in macroscopic video studies spans two orders of magnitude, from a single pair of coordinates in high-throughput activity-monitoring studies [34] to near 100 points in mesh representations of the fish’s outline for detailed kinematic analysis [31]. Maintaining a more complex, richer structure usually necessitates fine object resolutions, which depend on camera resolution and image magnification. Fish resolution can be expressed in number of pixels per fish, or per body length, and commonly ranges from a few pixels ([35], estimate based on camera resolution and image examples) to over 250 pixels ([31], estimate based on camera resolution and movie example) per body length. However, there is no simple rule that maps the two, as different applications may re-sample (resize) images in either direction. Depending on the chosen representation, tracking will mean very different things.

294

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

Table 1 Elementary locomotory patterns. Pattern name

Category

Characterization

Refs.

Slow start Cyclic swimming Short-latency Cstart (SLC) Long-latency Cstart (LLC) O-bend J-bend

Swim Swim Turn

Body waves of narrow amplitude and low, quickly declining, mean velocity (<10 body-lengths/s). Called scoots in [32] Body waves of wide amplitude and high, near constant, mean velocity (>7 body-lengths/s) C-bend occurring within 15 ms of the stimulus. The characteristic turn signing the start of an escape movement

[29] [29] [58]

Turn

C-bend initiated 20–60 ms after the stimulus

[58]

Turn Turn

[38] [30]

Routine turn

Turn

Large amplitude C-bend executed at low angular velocity in response to a drop in illumination Specialized unilateral movement, characterized by a distinctive J body shape, which slowly and moderately alters the orientation and position of the larva, allowing it to even backup if needed. A component in the prey capture sequence Slower turn than C-bend, not followed by a counter-bend movement; body bending profile different from J-bend

A richer structure may or may not facilitate tracking. On the one hand, a rich structure may provide more features to track across frames or more parameters to use for statistical fits, yielding potentially higher precision and reliability. On the other hand, it may require a more sophisticated type of programming to handle the effects of incomplete detections and occlusions (the temporary hiding of some part of a target from the sensor) on the larger number of parameters. In [31], the mesh model fits the fish’s outline but not the pectoral fins; therefore, special treatment is needed to make sure that the image of the pectoral fins do not interfere when matching the model to the fish. In general, the complexity of the object’s representation drives computational costs. For example, Fontaine and colleagues [31] report an average computation time of 5.5 ± 1.7 s per frame (on a 3.0 GHz IntelÒ Xeon processor). Depending on the experiment’s goal and throughput requirements, these costs may become prohibitive. In particular, when tracking is required in near real-time, the size of the data structures that can be maintained for each object is limited by the available computational and/or communication bandwidths. This available bandwidth directly depends on the frame rate (how much time is available to perform the computation), the number of objects to track, and the processor speed. In many zebrafish studies, tracking is expected to generate kinematics data, such as distance swum, velocity, and acceleration. From a measure standpoint, generation of such data should be viewed as resulting from estimation, yielding values and associated errors which should be maintained as part of the fish representation structure. Although details on the subject are sparse in the zebrafish literature, it seems that many studies generate kinematics data via mere computation on the values of the fish parameters in consecutive frames and smooth or bin the results. While such approach may be reasonable to compute average values, it cannot be used to determine precise range boundaries, such as maximum instantaneous velocities for example. Overall, the effects of various sources of errors (for example, quantization errors in both time and space, or feature extraction errors caused by varying illumination or interacting fish) on downstream computations via difference (velocity, acceleration) or integration (distance swum) are discussed in only very few studies [36]. 4.1. Point and vector representations One of the most parsimonious manners of representing a fish position is as a point, using the coordinates of its ‘‘center’’, or, by adding a scalar angle, as a vector. Conceptually simple, this method nevertheless requires selecting an appropriate definition for center among competing candidates, which in turn depends on the method used to illuminate the arena, the magnification, the sensor resolution, and the downstream image processing. The various definitions all aim at estimating the position of a point that maintains a constant horizontal position relative to the fish body. The following definitions are most often used and are illustrated on Fig. 2A and B.

[28]

These definitions are commonly computed using macro languages such as the ImageJ macro language (http://rsbweb.nih.gov/ij/developer/macro/macros.html), script languages such as the Matlab Image Processing Toolbox (http://www.mathworks.com/products/image/), or image processing libraries such as ImageJ (open source library of java classes, http://rsbweb.nih.gov/ij/index.html) or OpenCV (open source programming library with C++, C, Python and Java intefaces, http://opencv.org). In particular, most of the definitions below can be computed as image properties using the ImageJ Analyze Particles function (referred to below as ‘‘imagej’’) or the Matlab Image Processing Toolbox regionprops function (referred to as ‘‘ regionprops’’), and some studies, such as [26] or [34], provide source code as online supplementary material. Binary centroid The average coordinates of all the pixels in a binary region extracted from the image as representative of the fish. Such region is usually made of connected pixels [34]. Computable as imagej ‘Centroid’ or regionprops ‘Centroid’. Weighted centroid Same definition as binary centroid except that pixels are given individual weights, which can be the gray-level value of the pixel (intensity centroid, [37]), or can result from a transformation applied to the image, for example a convolution. Computable as imagej ‘Center of Mass’ or regionprops ‘WeightedCentroid’. Center of bounding box The bounding box is the smallest rectangle having borders parallel to the image borders that can contain the binary region(s) representative of the object. The coordinates of the corners of the bounding box can be computed via imagej ‘Bounding Rectangle’ or regionprops ‘BoundingBox’. Local maximum The point where a particular real-valued pixel function reaches its maximum when restricted to a domain corresponding to known fish features (for example, fish head in [38] or eyes and swim bladder in [36]); such a function can be the light intensity or its complement, or a more complex transformation, such as a convolution or a correlation. Computable via ImageJ’s MaximumFinder class or Matlab’s imregionalmax function. When fish orientation is characterized by a single angular value, custom methods are used. Creton [37] uses the vector formed by the binary and weighted centroids, Burgess [38] uses a line fitted from a centroid on the head and extending about 1/4th the body length of the fish, and Pelkowski [26] uses the vector formed by the weighted centroid and the center of the bounding box. Despite their simplicity, centroid-based point representations are potentially problematic, as they can be subject to sampling error and their values may not be independent from resolution and illumination levels. These issues are discussed next.

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

A

B

295

C

D

E

F

G

H

Fig. 1. Assessing immobility. (A–C) A 20 s episode of freezing behavior in two juvenile zebrafish after introduction of live artemia and closure of the recording arena caused water and illumination disturbances. (A) A compound image of the 1 s recorded sequence prior to the first fish freezing reveals the compound motion of the fish (in black) and of the artemia (in red); the motion of the artemia results essentially from the turbulence created by the pipette. (B) Compound image of the 20 s during which both fish freeze. As the turbulence settles, the artemia motion results increasingly from their independent swimming. Although the fish remain strikingly immobile during the entire event, slight motion is discernible, resulting in the fuzziness pointed out by the blue arrows, likely the result of the fish effort to fight the residual water current. (C) Compound image of the one second recorded sequence after the fish resume motion. The compound artemia motion, now entirely the result of their swimming action, is very small compared to the fish. (D) The two behavior modes (active and freezing), vary over time, and the number of artemia counted decreases over time as a result of feeding. (E–H) Patterns of activity of twelve 5 dpf zebrafish during 20 s in normal daylight conditions. (E) Fish trajectories are delineated and color discriminated to match charts (F) and (H); the three immobile fish are shown in black and circled. Moving fish are individually numbered near their start positions. The dish is drawn with a larger diameter to ease distinguishing trajectories. (F) Timing of each fish’s individual movement, shown in white, binned in 267 ms intervals. The background, color coded to match the fish, corresponds to immobility and shows the prevalence of rest episodes. Among the total 675 time bins, only three (circled in black) show no discernible rest. (G) Ethograms of fish locomotory movements classified as immobility (gray), turn (blue), scoot (green), burst swim (red), and cyclic swim (yellow). The presence of discernible waves in the fish motion path is used to separate the three movements, as follows: no waves, scoot or turn (depending on body angle), one to five waves, burst, more than five waves, cyclic swim. (H) Normalized total distance swum and inactivity (percentage of time spent not actively moving). Fish are ranked from left to right in order of decreasing total distance. The black bars show the total distance swum, normalized by the largest distance (219 mm by fish 5). Each color rectangle shows fish inactivity when cumulating all episodes regardless of durations. Despite the fact that most fish are quite active, as revealed by the ethogram, inactivity is higher than 70% for all fish. Dashed white bars show inactivity when calculated after discarding rest episodes less than 1 s in duration. Solid white bars show inactivity when motion less than 1/10 of the body length is ignored. Each measure definition yields a different ranking between the fish, and none seem to relate meaningful information, illustrating the difficulties inherent to threshold-based definitions.

4.2. Sampling issues On a given video frame, pixel values result from sampling in four dimensions, namely space (2d), energy, and time. Spatial sampling is performed by the lattice arrangement of sensor pixels, which produces discrete planar coordinates. Pixel gray-level values result from the quantization (energy sampling) of the pixel charges

accumulated during sensor exposure over a relatively small range of integer values (usually from 0 to 255). The periodic recording of frames at a fixed time interval (the inverse of the frame rate) performs time sampling. The conditions under which a signal can be restored from sampled data are governed by the Shannon–Nyquist sampling theorem ([39], p. 63). The theorem states that the size of the sampling

296

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

A

B

C

D

E

Fig. 2. Fish representation alternatives (see main text for details). (A) Point representations show the main fish features and positions of centroids. Ocular vergence is shown [54]. (B) Enriched vector representation includes maximum-based center, orientation, and bend angle [38]. (C) Contour model uses a mesh to outline fish and describe segments [31]. (D) Skeleton model with six straight line segments defined by their nine extremities; based on manually extracted features [49]. (E) Skeleton model using a single cubic Bézier spline defined by its four control points: maximum-based center (B0); tail tip (T0) and control points (B1 and T1) computable via best fit algorithm. Control points for each of the three body configurations (blue, red, and green curves) were hand selected for illustration purposes.

interval should be less than half the smallest period of interest. In the case of multidimensional sampling in which each sampling dimensions are independent, the theorem can be applied to each sampling dimension. When the conditions of the theorem are not met and the signal contains frequencies higher than the sampling rate (for example, when recording at 30 Hz the waving movements of an escaping zebrafish), the signal reconstructed from sampled data is imperfect. It may also contain artifacts that take the form of artificial periodic signals that do not exist in the original signal, a phenomenon called aliasing. Reliable kinematic measures should remain consistent – ideally invariant – when computed under different frame rates, illumination levels, and resolutions. In many cases they are not; here is a brief summary of some of the most important considerations (see [36] for a more in depth discussion). Frame rate By performing finer time sampling, higher frame rates should be expected to produce finer estimations of higher order kinematics (by order, we mean the degree of differentiation that produces the quantity: if position is the zero order quantity, velocity, the first derivative of position, is a first order quantity, and acceleration, the second derivative of position, is a second order quantity). However, this only happens if a higher frame rate does not cause a higher measurement error at each frame. There are two common conditions under which higher frame rates may produce higher measurement errors. First, exposure time being constrained by frame rate (the exposure time must be shorter than the interframe interval), the smaller amount of light captured in individual frames may yield a lower signal-to-noise ratio in the images, and therefore a higher error. Second, by reducing the distances that moving objects will cover during a smaller sampling interval, the ratio between positioning error and distance covered, a key driver of error in estimating higher order kinematics, increases. Therefore, perhaps surprisingly, a higher frame rate may produce lower quality measures.

This last point illustrates the relatedness between temporal sampling rate (frame rate) and spatial sampling rate (image resolution) discussed in [36]. Illumination level In transmitted light set ups, illumination levels impact sharpness of contours and contrast between the bodies of transparent animals and the background, and therefore the size of pixel regions that can be extracted to represent fish. The fish bodies being elongated, and the tail region bearing fewer pigments than the head region, this impact is not balanced: not only do sizes of regions vary, so do their centroids, as illustrated in Fig. 3A–C. A reliable detection system should be robust to changes in illumination levels so that light fluctuations do not create artifacts in fish position detection. Defining the fish center as the intensity extremum over the fish outline after applying a convolution, as illustrated on Fig. 3C, is a very robust approach [36] in that respect as the resulting center depends on features that are not sensitive to illumination levels (eyes and swim bladder reflect light and create illumination peaks regardless of the intensity) and the computations, based on regional maximum extraction, do not use illumination thresholds. Another aspect of the problem is quantization, the process of translating sensor pixel charges into discrete integer values, and the associated potential for loss of information, in particular in image saturation situations. This process can be viewed as an electron counting problem; however, electron counts are typically in the tens of thousands, far exceeding the typically 256 available quantum values, and a conversion is needed. Electron counts reflect illumination levels. In well lit situations, they will reach the highest ranges. In low illumination situations, they span a much smaller range of values. To make sure the maximum number of available quantum values are used in the conversion process, electron counts are typically adjusted (by uniform multiplication with a gain value) before the conversion if performed. Depending on how the gain is adjusted, high electron counts may be lumped together and converted

297

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

A

B

C

D

Fig. 3. Sampling issues. (A–C) Fish centers, shown in yellow, vary depending on definitions, resolutions, and threshold levels. The same reference image is used in every image. The top row simulates a resolution of 40 pixels per body length (approx. 10 pixels per mm), the bottom row, 20 pixels per body length (approx. 5 pixels per mm). (A) Binary centroid of thresholded region. The center is defined as the binary centroid of all the pixels below a given threshold, rounded to the nearest pixel. The value of the threshold – an absolute gray-level value on a scale of 0–255 – shared by both images in each column, is shown in the middle of each column. As the threshold increases, the region includes more pixels, and the center migrates toward the mid-body; the center position’s sensitivity to threshold levels implies sensitivity to illumination levels as well. As a result of sampling with different pixel sizes, the center’s migration is stepwise; in addition, steps and actual positions may or may not coincide between the two rows. Contrary to the illustration, it is usual first to select the largest connected region before computing the centroid. Selecting regions based on threshold is akin to resampling, and therefore anti-aliasing measures should be taken, as illustrated next. (B) Binary centroid of thresholded region after anti-aliasing. Same centroid definition as in (A) after the image has been anti-aliased via convolution with a Gaussian of standard deviation 2 (top row) or 1 (bottom row), as determined by the ‘‘bandwidth limit’’, the minimum region size. (The threshold values have been multiplied by a factor of 1.6 to adjust for the convolution and to illustrate a wide range of values.) The centers’ positions are still sensitive to the threshold level, but in a much smoother manner than in (A). In addition, the centers’ positions coincide at the two resolution levels. (C) Maximum of thresholded region after anti-aliasing. Using the same anti-aliased images and threshold definitions as in (B), the center is now defined as the pixel(s) with minimum intensity values. The centers’ positions do not vary with threshold level and are consistent at the two resolution levels. (D) Fractal effect: longer measurements can result from more details uncovered by higher resolution. In this example, a wavy cyclic swim episode (bottom) is shown under two sets of conditions; in one case (bottom) the image reveals the cyclic pattern; in the other (top) an anti-aliasing step has been taken to remove high-frequency patterns (a required step when sequences are recorded at insufficient spatial or temporal resolutions). The trajectories are delineated in blue (top) and red (bottom) on each image and are also shown superposed (blue + red, middle), along with their corresponding lengths. The length difference illustrates a fractal effect.

to a single maximum intensity value (typically 255), causing all details to be lost, a situation called image saturation. Image resolution The impact of resolution is also complex. One of its facets is the potential for fractal [40] or pseudo-fractal [36] effects. A fractal effect example is illustrated on Fig. 3D. When calculated on a 5 pixel-per-mm image on which waves are visible, the distance covered during a cyclic swim episode is found to be 10 percent larger than when computed from the same image at 2.5 pixelper-mm resolution because proper anti-aliasing measures have removed the waves. 4.3. Structured representations In these models, the target is viewed as a multidimensional solid, which can be rigid, deformable, or articulated. 4.3.1. Contours and articulated models Target contours can be represented by the union of one or several simple geometric shapes (a single ellipse to fit Drosophila body in [41]), a piecewise union of segments (zebrafish embryos in [42]), a spline, a snake – also a spline, but fitting the shape using a different process – (zebrafish embryos [43]), a mesh (zebrafish larvae and juveniles in [31], see Fig. 2C), or an articulated model combining several contours in a coordinated manner (mice in [44]). Higher level models are found in the cell-tracking and in vivo 4D imaging literature, for example in [45] for early zebrafish embryogenesis. A concise review is found in [46]. In studies monitoring interacting subjects (zebrafish predatorprey interaction in [47], Drosophila peer-to-peer interaction in [41]), data describing the relative position of each subject is commonly added to the model.

4.3.2. Skeleton models An economic alternative to contour maintenance is to characterize the fish’s body geometry by its digital skeleton. The digital skeleton and its variants have a formal definition in digital image processing [48]. Applied to the processing of the zebrafish’s silhouette, this definition happens to coincide with the fish backbone and its protruding fins. In many cases, the simplicity of the fish outline lends itself to a very compact representation. Such compact representation can be obtained by approximating the skeleton by a small number of straight segments. For example, Burgess and colleagues [38] use three like-sized segments to estimate the fish’s bend angle (Fig. 2B), and McClenahan and colleagues [49] use nine control points to represent the fish and its two pectoral fins (Fig. 2D). Slightly more expensive but providing a tighter fit, the skeleton can be approximated in a piecewise manner by a combination of low degree B-splines characterized by their control points. Martineau [36] proposes to fit a single cubic Bézier curve, characterized by three Bézier points in addition to the fish center (a total of four Bézier points; Fig. 2E), to the part of the skeleton that extents caudally from the swim bladder. 4.4. Planar vs. volumetric representations Only a handful of studies to date have considered zebrafish in the three-dimensional volume in which they evolve ([50,51] adult; [52] larva’s vertical position; [53] larva computer simulation). Most zebrafish larva studies use a two-dimensional horizontal coordinate system to represent the larva position and neglect the depth component. This results from a combination of factors. The observation tanks, whether multi-well plates or Petri dish, are very hard to image other than from a vertical axis. To remain in focus on the vertically-pointing imaging sensor, variations in fish’s vertical position must remain under a small upper bound of 5 mm or less.

298

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

Constraining the fish movement vertically is usually achieved by either limiting the depth of the water column ([37] 3 mm; [54] 4 mm; [36] 4.5 mm; [26] 5 mm) or by physically restraining the fish body [54]. When constraining the larva movement is not possible and deeper water columns must be used (e.g., 8 mm in [29], 2.5 cm in [47]), scenes in which vertical depth is maintained are selected manually or created by the proactive control of the researcher while recording the experiment. Therefore, we will examine larva motion as a two-dimensional, planar problem.

5. Feature extraction Feature extraction is the subject of entire chapters in digital image processing textbooks [39]. In the context of zebrafish tracking, it can be summarized as (1) the identification of image pixels that belong to the tracked fish(s) and (2) the assembling of these pixels into coherent regions that can be mapped to individual fish. Video target detection can be very difficult. Not all target pixels may be visible for reasons of lack of sharpness, noise, or occlusion by other targets or objects present in the tank. Occlusions can create situations in which some targets may be totally missing from a particular frame. Or, because some target pixels may be hidden, pixels belonging to a given target may not form a topologically connected region and may be mistakenly taken as two different targets. Sensor noise and frame-to-frame illumination variations may add further difficulties to the process of extracting targets from the rest of the image. In addition to these generic difficulties, zebrafish larvae bring some specific challenges. Low pixel-per-fish counts, a result of small size and low pigmentation, make detection even more difficult under transmitted light; in particular, body contours and fin outlines are usually very hard to see by cameras and humans alike. In addition, free swimming larvae have a tendency to rest frequently along the side of the arena (this is how we will refer to the observation container); when there, the larvae may be partly hidden from camera view due to parallax issues or the meniscus interfering with the image path. On the positive side, zebrafish larvae offer a few attributes that can be exploited. As with most fish, zebrafish bodies, disregarding the pectoral fins, possess simpler topological properties (such as convexity for example) than other animals. Their body transparency can be used advantageously under darkfield type illuminations, and their eyeballs and swim bladder are highly reflective to incident lights. The algorithmic steps most relevant to zebrafish larva tracking are reviewed below. Some of these steps are described in more detail and from a different perspective in [55].

5.1. Background processing The background image of the arena often needs to be estimated. Usually, this is because the downstream processing involves background subtraction, a common technique for extracting moving features from images. Another reason may be that the monitoring system performs automated image registration and is looking for known background features (the shape of a multiwell plate for example, as in [34]). A variety of techniques exists for background estimation, from simple operations performed once on one or several images (for example, calculating the average), to complex operations computed and updated at regular intervals (for example, building and maintaining a statistical model of the background). Unlike

other treatments, background processing is usually not performed at every frame cycle. It would seem intuitive to try estimating the background based on images of the empty arena (with water but no fish). However, Branson [41] notes that this approach is problematic because of the risk of the arena being slightly shifted after the animals (Drosophila in this case) have been introduced, an incident that may have major processing consequences downstream. With fish, introduction in the arena may modify the water level or move potential water impurities, creating further problems. Therefore, background processing should be able to cope with the presence of animals in the arena. In [41], the approach is to build a background model as a mixture of Gaussians, one per pixel, estimated from 200 samples taken at constant intervals over the entire recording. Each Gaussian is centered on the pixel’s median intensity and uses the standard deviation of the absolute difference of that pixel’s intensity with the median. This approach may be problematic if some animals stay immobile during the entire sampling period as they may become part of the background. In this case, an additional procedure needs to identify and remove such animals from the sample images. The image voids created by such removals are then refilled with new synthesized gray-level values; a typical process is to solve Laplace’s diffusion equation to compute interpolated values for the region’s interior based on the regions’ boundary pixel values [31]. Estimating the background only once is possible when tracking is performed post-mortem over a period during which illumination conditions do not change. When tracking must start before the experiment is complete, or when the experiment’s conditions may include light phase shifts (in overnight recordings for example), changes in water levels, or shifts in the arena or changes in its contents, the background may need to be updated periodically. 5.2. Preprocessing In many cases, images need an initial generic processing step before actual feature information can be extracted. Common operations include: Region of interest (ROI) extraction In many situations – for example, a circular arena imaged with a typical rectangular camera sensor – recording conditions allow the a priori exclusion of certain sensor regions from further processing, reducing the computational load. In many studies, this procedure is performed manually. Arena detection–registration–calibration The fixed geometry of the arena can be located and the illumination level calibrated automatically. Branson [41] uses a Hough transform [39] to detect the circular shape of the observation arena. Dankert [27] uses a Canny edge detector [39] for the same purpose. Martineau [36] detects light day/night phases and transitions by measuring the light intensity of a specific tank location. Resizing Processing may require images at resolutions different from the capture resolution, or at several different resolutions when building an image pyramid [39]. Filtering A variety of goals (noise removal, deblurring, contrast enhancement, blurring, anti-aliasing, registration, etc.) can require the application of various types of filters, convolutions, or fast Fourier transforms.

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

5.3. Thresholding – background subtraction Thresholding eliminates irrelevant pixels by comparing them to a threshold. The threshold can have a single value for the entire image (basic thresholding), or its value may vary with the pixels’ location (adaptive thresholding). In either case, the threshold value can be fixed or computed dynamically; an example of the latter is optimal thresholding in which one attempts to minimize a cost function. Thresholding is frequently performed after the background or the previous frame [34] has been subtracted from the image. In some other cases, thresholding is performed after a more specialized transform – for example, a morphological or a filtering operation – has been applied to the image. Thresholding is not always a one-step process. In [41], thresholding with hysteresis [39] is used to minimize the impact of noise: regions labeling (see below) is performed on a binary image produced from a low threshold value; any region not containing values above a second, higher threshold is discarded. In the zebrafish literature, the result of thresholding is usually a binary image. Other types of thresholding can produce other types of images, such as real-valued probabilistic images suitable for computing fuzzy connectedness [39]. Fuzzy approaches alleviate a major potential problem with binary thresholding: the potential loss of a meaningful track due to missed detections on a single frame. Binary thresholding may create new hidden aliasing artifacts downstream if its input is not processed properly; these artifacts are illustrated on Fig. 3A. The precise significance of these issues and their adequate handling are beyond the scope of this paper; we will simply outline their cause and potential impact. Thresholding can be viewed as a requantization of the image, and removal of small regions produced via connected component labeling (defined next) based on size is akin to spatial resampling (or resizing) and therefore subject to the Shannon–Nyquist sampling theorem. Removal of small regions, a widespread approach, is usually not justifiable unless adequate prior anti-aliasing measures (typically via blurring) have been taken (see Fig. 3A and B for an illustration). Many studies do not provide details on whether and how these steps are taken, often as a consequence of dependencies on third party, black box-type software. A frequent consequence of improper processing is dubious movement; which is eliminated afterward by averaging or binning; this has the unfortunate side effect of reducing the system’s sensitivity to small movements and true immobility. We presume a reason for not taking proper anti-aliasing steps may be that such steps would further reduce the already low contrast of the larvae when illuminated in transmitted light. 5.4. Absolute extraction In some illumination situations, light intensity can be interpreted directly. For example, larvae illuminated tangentially, as in [36], create peaks of intensity around their highly reflective body regions, the eyes and the swim bladder. After proper image blurring (see under preprocessing above), the anterior position of the fish can be extracted by searching among the blurred image’s local maxima for a specific morphological pattern, via a correlation or a morphological operation for example. Matches can be sorted on the basis of fitness strength and the desired number of extracted fish. 5.5. Connected component or region labeling – handling occlusions Once relevant pixels have been identified, they need to be assembled into topologically connected regions. Connected

299

component labeling [39] is typically used for that purpose. Connected component labeling can also be used on other types of images besides intensity images, such as gradient or Laplacian images, for more specialized types of segmentation purposes – connected contour detection for example. When several fish are present, the output of region labeling may be difficult to interpret. Bodies of fish near each other may produce fused regions, and occlusions caused by overlapping fish bodies may occur. Kato and colleagues [56] use a kind of morphological opening (see under morphological processing below) to split such regions. By contrast, Delcourt and colleagues [57] authorize regions to represent two fish temporarily, subsequently attempting to reassign proper fish identity on the basis of motion analysis. In general, the richer the representation model, the more tools can be used to resolve ambiguous situations; however, it seems prudent to assume that unless three-dimensional position or motion dynamic is captured, some situations will remain very difficult to disambiguate. Fig. 4D shows an actual example of a situation difficult to disambiguate statically. 5.6. Filtering The same types of operations described under preprocessing can be applied for specific feature extraction purposes; for example, edge detection. 5.7. Morphological processing Morphological image analysis groups a large family of operations [48]. Most commonly in the zebrafish literature, dilation, erosion, and their compositions, opening and closing, are used to remove small irrelevant regions, to reconnect artificially fragmented regions, or to separate close or overlapping targets [56]. A wider palette of operations is considered in [36], including watershed mapping, image reconstruction, fillhole and h-extrema transforms, and geodesic operations. 5.8. Putting it together The number of steps involved in extracting features and assembling them into targets may be quite large. Two interesting examples of complex detection chains are detailed in two Drosophila studies performed at standard video frame rates. One uses a commercial VGA camcorder [27] at an effective resolution of 20 pixels per fly length (10 pixels per mm), while the other uses an XVGA industrial camera [41] at an effective resolution of 8 pixel per fly length (4 pixels per mm). 6. Target tracking – frame-to-frame association Humans perform motion tracking routinely by anticipating movement. There are many ways to program this intuitive ability in software, and some of them are quite complex. Some basic math will help describe and classify these approaches and identify the root of zebrafish larva tracking difficulty. When watching a moving object, a tennis ball or a skater zigzagging on ice for example, we can intuitively compute the object’s future course based on our appreciation of its velocity and acceleration. From a mathematical point of view, our intuitive computation can be justified by Taylor’s theorem which, under certain conditions, allows approximating a function in the vicinity of a point where the values of the function and its derivatives are known. Under adequate continuity and differentiability hypotheses, the theorem can be formulated to express the position ~ P of an object at time t0 + d as the sum of its n k-ths derivatives ~ PðkÞ

300

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

A

B

C

D

Fig. 4. Examples of hard-to-track zebrafish swim patterns. (A) A 266 ms sequence of a clutch of 60 ten dpf zebrafish during the day. Although the clutch is highly active, little individual fish activity is revealed during the period, a trademark of the burst and coast mode of locomotion. (B) Complex 3-fish collision extracted from image in (A). (C) 20 s sequence of twelve 5 dpf larvae showing a variety of hard to track situations; fish show a wide repertoire of movements, including fast cyclic swimming; one cyclic swimming episode takes place along the side of the Petri dish (circled in yellow), making it easy to miss if parallax and meniscus effects are not properly addressed. (D) During a cyclic swim episode, a fish crosses the position of an immobile fish, causing a partial occlusion (green circle in C). A fish crossing an immobile fish – a situation easy to resolve with motion prediction – may cause least square and k-nearest neighbor approaches to fail: the square distance produced by rightful association will often be larger than the total square distances produced by switching fish identities (provided the trajectory of the moving fish has a large enough curvature).

and a remainder ~ Rn (which can be upper-bounded or computed), as so:

~ Pðt0 þ dÞ ¼

n ~ðkÞ X P ðt 0 Þ k¼0

k!

dk þ ~ Rn ðt 0 þ dÞ

ð1Þ

Using a more specific formulation, we can calculate the position at time t0 + d considering only the position ~ P, the velocity ~ P0 , and the acceleration ~ P00 , provided they are continuous, with the following equation: 2

d ~ Pðt0 þ dÞ ¼ ~ Pðt0 Þ þ d~ P 0 ðt 0 Þ þ 2

Z

t 0 þd

~ P00 ðtÞdt

ð2Þ

t0

Unfortunately, the acceleration continuity condition is violated by zebrafish larvae in a Petri dish every time they perform a Cstart. In brief, this is what makes tracking larvae so difficult. Setting this difficulty aside for now, we will review the various tracking techniques according to the degree to which they leverage Taylor’s theorem. Perfect tracking may not always be needed. In some situations, correct fish identity has to be maintained only for as long as required by the computation of a particular measure [57]. This type of statistical tracking, by contrast to rigorous tracking, far lowers the performance requirements. Delcourt and colleagues [57] further note that, provided assignment error rates are reasonable, hand correcting assignment errors can be practical in many situations. 6.1. Approaches to target motion tracking 6.1.1. Segregation – strongest neighbor By far the most common situation in which groups of zebrafish larvae are routinely tracked in the literature is while individually segregated in multi-well plates. Provided that detection is correct, such setup removes the potential for track mis-assignment, as the association’s accuracy results from the registration of the image to the frame of reference of the containers’ image. Notwithstanding the potential impact of a small enclosure on the fish behavior, the main concern is then simply the accurate estimation of fish position on each frame, and Taylor’s theorem is not used. Despite the fact that fish are segregated, target detection will likely need to select a candidate among pixel grouping alternatives, as illustrated in Fig. 2A. In general, the strategy is to select the strongest among a set of potential candidates, an approach we will refer to as strongest neighbor. In [34] for example, the fish’s representative region is extracted by applying a threshold to the

difference between the current and previous frames and by removing connected components with pixel counts below a set threshold. If no such region is found, the position of the fish at the previous frame is used; if several candidate regions are found, the event is recorded in an exception tracking file, and the region with highest pixel count is selected. Even though larva identity allocation errors are prevented by physical means and the integrity of fish identity data is therefore guaranteed, the quality of the extracted data is not. In Fig. 3A–C, we provided examples illustrating the fact that many detection methods are sensitive to illumination and resolution levels and could produce low quality positional data. Cario and colleagues [34] use a combination of filtering and thresholding on the extracted position to extract kinematics and activity data. Since the authors provide open access to their source code, the detail of the processing can be verified or modified to fit specific purposes. We do not know which techniques are used by commercial systems such as DanioVision (Noldus Information Technology) or ZebraLab (ViewPoint Live Sciences) to track segregated fish because the actual source code is not readily available for review. 6.1.2. Nearest neighbor The nearest neighbor approach consists in associating targets to the closest tracks in the previous frame. It leverages the Taylor’s theorem at order zero, which simply expresses the continuity of a physical object’s position over time. It provides a practical solution if d can be made small enough (usually via high enough frame rate) so that fish will stay within non-intersecting spheres centered at their t0 positions and having norm of ~ R0 (from Eq. (1) above) as their radii. These conditions can be met in the following zebrafish recording situations:  segregated fish: individual fish containers do create spheres that fully contain each fish, thus providing an upper boundary for the norm of ~ R0 , justifying the selection of the nearest neighbor whatever the chosen frame rate.  selected video sequences: in some situations, video sequences in which fish consistently remain sufficiently far apart can be selected. This is the approach taken in [38] and [58]; provided the fish density is not too high, sequences long enough for the required processing can be isolated. In general, a nearest neighbor-based approach cannot be safely followed when tracking groups of zebrafish larvae due to the larvae’s typical interaction and movement. Zebrafish larvae very often

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

come close to each other, causing their images to overlap, and their velocities show frequent spikes. Under these conditions it is no longer possible to isolate each individual fish between consecutive frames in non intersecting spheres; several fish may share the same nearest neighbor or the nearest neighbor to some fish may be a different fish altogether. Using the k-nearest neighbor instead and making the final assignment via least squares minimization will not work either, as the example in Fig. 4D shows. Increasing the frame rate is not a viable option; besides generating undesirably large data volumes in long recordings, other problems would appear due to potential detection errors, the rate of which would augment with the frame rate. Because of larvae’s small size and transparency, it should be noted that a pure, unsupervised nearest neighbor approach may be problematic even with individually segregated fish because the enclosure may contain other objects or illumination discontinuities that may be mistakenly detected by the system as fish. In such situations, a system may lock or periodically jump onto an irrelevant target. Therefore, unless the detection results are monitored, or the kinematics is verified for sanity, or multi-hypotheses are maintained, an unsupervised nearest neighbor approach to tracking zebrafish larvae has inherent potential for producing hard to detect artifacts. 6.1.3. Motion prediction: predictor–corrector methods We will now consider the Taylor’s theorem at the first order, the next order up, and attempt to use knowledge of prior velocity to predict motion. Typically, motion prediction-based tracking methods are implemented algorithmically via a predictor–corrector model. In this model, tracking from frame to frame is performed by first anticipating the position of tracks on the current frame based on their values on the immediately preceding frame, then by correcting the estimate with the latest target data extracted from the current frame. Two main types of algorithms implement these methods: Kalman filters and particle filters, which we will briefly describe next. Kalman Filter Kalman filtering [59] is the oldest and most widely used implementation of a predictor-corrector model for tracking purposes. In the predictor phase, estimates of the targets’ positions are computed from their previous values via a linear motion model to which a white Gaussian noise is added. In the corrector phase, the estimated target positions are corrected by minimizing the residual difference between predicted and observed values, and the estimate covariances are adjusted accordingly. In its original form, Kalman filtering requires the prediction model to be linear and the process and observation noises to be Gaussian-distributed. These requirements are very restrictive when considering zebrafish tracking, as they forbid nonlinear dynamics, a common occurrence in larva motion, as well as multi-modal swim modes. The linearity requirement has been relaxed in two subsequent developments: the Extended Kalman Filter [60] and the Unscented (or Sigma Points) Kalman Filter [61], the latter leading to higher performance when tracking highly non linear motion. Fontaine and colleagues [31] use Sigma Points Kalman filtering for tracking the perimeter points of a mesh representation of the zebrafish larva body. The Kalman filter process is designed to produce globally optimal solutions, a key asset in the context of multi-target tracking. However, when its performance degrades – for example when it produces unacceptably high estimate covariances – subsequent processing, such as multiple hypothesis tracking, may be required. Multiple hypothesis tracking can represent a

301

substantial component of the overall multi-target tracking process; it can be implemented by leveraging a variety of approaches from the fields of sensor fusion and knowledge based systems. In the context of tracking zebrafish larvae in group we believe that alternative approaches to Kalman filtering, which we will describe next, are more suitable. Another related approach uses motion prediction to find a globally optimal track-target allocation and minimizes the need for track deletion and creation by incorporating hindsight [41]. It is successfully applied to the tracking of interacting flightdeprived Drosophila using an effective resolution of 8 pixels per fly length (comparable to many zebrafish larva recording conditions) and a 20 Hz frame rate. Particle filters Situations combining frequent occlusions with non linear motion – as seen in a clutch of zebrafish larvae in a Petri dish – may become too challenging for practical application of Kalman filtering. Particle filters offer an interesting alternative. This technique can be viewed as a generalization of the Kalman filter in which the linear model in the predictor phase has been replaced by a stochastic process. In its first installment, CONDENSATION (CONDitional DENSity PropagATION [62]), the stochastic process uses factored sampling from a probability distribution. Several developments have examined various stochastic process alternatives; for example, a Markov chain Monte Carlo (MCMC) approach, designed to handle missed detections as well as multiple redundant detections, is illustrated in [63] with the tracking of a group of 20 Aphaenogaster ants. To depict the essence of the process and to relate it to our underlying rationale, we can say that particle filtering no longer attempts a Taylor’s theorem based approach, and consequently is free from differentiability and continuity constraints. Instead of using known values of the derivatives, different candidate targets and target motions are tried, based on statistically reinforced knowledge from the previous frame, in the hope of maximizing the chances for a match. Monte Carlo based simulations, made practical today by high desktop processors’ performance, allows for a kind of trial and error approach to finding a best matching target-track allocation. Few studies compare the power of these algorithms in the context of zebrafish tracking, and even fewer in even more challenging animal tracking situations in which background and illumination are not controlled. French [64] provides an interesting in depth discussion of these algorithms in such uncontrolled context with the tracking of individual white Peking ducks in a flock recorded in a large outside pen on a farm under natural conditions. 6.1.4. Motion in hindsight: track-before-detect An altogether different approach is taken in [36] by increasing the frame exposure time and decreasing the interframe time; noise is kept at a minimum by using darkfield illumination. As the interframe time is reduced to the 1 ms minimum value supported by the camera hardware, fish images on consecutive frames are connected and fish trajectories can be directly extracted from images as lines of maximal light intensities. The majority of transient occlusion situations, such as shown on Fig. 4, are naturally resolved by the dynamics captured in the image. The only occlusions requiring specific processing are those creating trajectory ambiguities, which are flagged and can be resolved via heuristics, via graph shortest-paths considerations, or by visual inspection. The data underlying Fig. 1 was computed using this technique.

302

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

To provide mathematical justification, we can relate this approach to the Taylor’s Eq. (2) rewritten at the first order, as so:

~ Pðt0 þ dÞ ¼ ~ Pðt0 Þ þ d

Z

t 0 þd

~ P0 ðtÞdt

ð3Þ

t0

If the frame recorded at time t0 had an exposure duration of d, the camera sensor will have performed the integration of the second left term in Eq. (3) by the time the frame capture is complete. This term is nothing but the actual geometric trajectory of the target, a line. Provided such a line can be easily extracted from the resulting image, the position of the target at time t0 + d will be known with certainty. In addition, no hypothesis on the acceleration is necessary – not even existence; continuity in velocity is the only necessary hypothesis, a hypothesis always satisfied in free swimming zebrafish larvae as a result of Newton’s laws of motion. Therefore, provided that the illumination is such that trajectories can indeed be extracted easily, this approach has the double advantage of being both economical, since no prediction computation is necessary, and mathematically correct. 6.2. Challenges to zebrafish larva motion prediction Model-based tracking of zebrafish makes sense when studying the fine dynamics of swim for example, but it comes at a significant computing cost. Few zebrafish larva publications report using a motion prediction-based algorithm to track zebrafish larvae, and all which do use motion prediction to track fish as geometrical models, not mere points. Processing video frames captured at 1500 Hz, Fontaine and colleagues [31] report an average processing time of approximately 5.5 s per frame on a PC class machine. Because long per-frame processing times are multiplied by high frame rates, such model-based tracking would need to be several orders of magnitude faster to be of practical use in routine monitoring situations. As an alternative, one could consider applying motion filtering using a simpler representation of the zebrafish larva, a point or a vector for example. However, this will require addressing significant issues related to the nature of the zebrafish larva’s movement, issues that – as far as we know – have not been encountered in successful applications of motion prediction algorithms. These issues, which may be further complicated down the road by peculiarities exhibited by zebrafish mutant strains, include:  Oscillatory movements. Frequent movement sequences, such as cyclic swim and many burst swim episodes, have oscillatory components that are likely to cause aliasing if not sampled at appropriate rates in both time and space (>100 Hz at VGA resolution in [36]), which in turn might heavily degrade the performance of the motion estimator by corrupting its update.  Prevalence of rest episodes. Especially during the first week of their free-swimming stage, the time spent not moving by larvae represents a large percentage of time, as illustrated in Fig. 1H. In fact, very brief rest episodes are frequent during normal burstand-coast motion; although brief when observed with the naked eye, these episodes will span several consecutive frames when sampled at a high enough frame rates. When motion is initiated or resumed, the distribution of fish angular velocities may be either relatively narrow (and therefore suitable for motion prediction) during burst and coast swim, or quite wide when the fish is starting an escape movement.  Multi-modal motion. Zebrafish larvae have several very different swim modes, each of which may be initiated from a state of immobility or switched abruptly. It is not clear to what extent the predictor–corrector methods described earlier can be applied to multi-modal motion. Kalman filter-based methods usually exclude it explicitly. MCMC-based particle filters [63]

assume the latent variables verify the Markov property: their conditional state at time t1 given their state at time t0 depends only on their state at time t0, and not on the state at any prior moments; multi-modality usually violates this property. In this context, applying MCMC based motion-prediction methods seems far from obvious.

7. Conclusion Real-time routine tracking of individual free-swimming zebrafish larvae in group with ordinary lab equipment remains an elusive goal today. A number of approaches have been tried successfully for tracking larvae under specific conditions, and one should expect that the surge of research activity in this area will bring fast progress. However, comparing systems and approaches in absolute terms is difficult because of the number of facets to the problem. Besides the technical difficulties in comparing approaches, other difficulties result from the fact that different studies have different goals and use different methods. Many studies are interested in high-throughput assay-type methods, in which sensitivity and specificity are paramount concerns. Measurement errors – even identity assignment errors – may be tolerable as long as they do not jeopardize sensitivity and specificity, and tracking performance can be measured better by its throughput than by its accuracy. An abundant literature testifies to the fact that a number of adequate video-based tools are in routine use today. On the other hand, studies may focus on the mechanics of behavior or investigate biological questions that require analytical methods; for those, resolution and accuracy – the ability to see more and more accurately so – are the dominant concerns. Tracking accuracy requirements are high and fish identity assignment errors are not tolerable. In such studies, tracking performance is assessed by the precision of generated measurements, and throughput is of marginal importance. Again, in many such instances adequate methods have been designed on a custom basis. In both categories of examples, the tracking approach and performance is already validated by its end result, a publishable study, and examining tracking in further details may not be required. But in the rapidly expanding research field of zebrafish larva behavior research, existing published studies provide only a limited perspective. What about studies that have not been conducted yet because the tracking requirements were out of reach? Today, the frontier seems to lie when both high accuracy and high throughput are required at the same time, a situation that can be typified as follows: video-track, in real time over unspecified lengths of time, individual zebrafish larvae in a clutch freely swimming in a Petri dish. Tracking methods having the potential of moving this frontier forward exist. Investigating the potential of these methods and comparing them to existing methods requires an examination of their video processing chains in great enough detail to appreciate the issues discussed earlier in this review. Great enough detail means that access to source code, calibration data, and video test sequences is granted for the to-be-compared methods. Few studies adequately demonstrate the accuracy of their tracking algorithm. Many studies do not provide visibility at this level of detail, making validation difficult if not impossible. Reasons may be that such level of detail is usually not needed a posteriori, as we indicated earlier, or that the reliance on third party, black box components may render some of these details invisible. It seems that an increasing number of studies are making a special effort to provide extended access to their methods by providing access to source code as supplementary data [26,34] on an open-source basis.

P.R. Martineau, P. Mourrain / Methods 62 (2013) 292–303

This will go a long way towards helping method comparisons, but source code alone is not enough. A sample of the raw video data used to produce the presented results is needed if one wants to be in a position to investigate robustness aspects or run comparative tests. Precedents exist in other research areas. For example, the specific needs of many image processing fields drove the development of shared image databases (see for example the collection at http://www.cs.cmu.edu/afs/cs/project/cil/ftp/html/v-images.html), many of which highly specialized. In a similar manner, the availability of representative video sequences would facilitate the comparison of alternative approaches and would foster research cooperation. Until these emerge, authors who provide video examples as supplementary material might consider stating, as they often do for the source code, whether they allow other teams to reuse their examples. These could constitute the kernel on which a zebrafish larva video sequence database could be built. Acknowledgement The authors thank Drs. Brian Grone and Louis Leung for fruitful discussions and critical reading of the manuscript. P.R.M. and Ph.M. work is supported by the National Institutes of Health (MH088176, NS062798, DK090065). References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]

R. Portugues, F. Engert, Curr. Opin. Neurobiol. 19 (2009) 644–647. J. Ninkovic, L. Bally-Cuif, Methods 39 (2006) 262–274. L.J. Kily et al., J. Exp. Biol. 211 (2008) 1623–1634. M.A. Lopez-Patino et al., Physiol. Behav. 93 (2008) 160–171. K.J. Webb et al., Genome Biol. 10 (2009) R81. A.M. Petzold et al., Proc. Natl. Acad. Sci. USA 106 (2009) 18662–18667. E.T. Larson et al., Behav. Brain Res. 167 (2006) 94–102. W.H. Norton et al., J. Neurosci. 31 (2011) 13796–13807. S. Jesuthasan, Dev. Neurobiol. 72 (2012) 395–403. C. Saverino, R. Gerlai, Behav. Brain Res. 191 (2008) 77–87. R.E. Engeszer et al., Proc. Natl. Acad. Sci. USA 105 (2008) 929–933. K.P. Mueller, S.C. Neuhauss, J. Integr. Neurosci. 11 (2012) 73–85. F. Engert, Front Neural. Circuits 6 (2012) 125. G.M. Cahill et al., Neuroreport 9 (1998) 3445–3449. I.V. Zhdanova et al., Brain Res. 903 (2001) 263–268. T. Yokogawa et al., PLoS Biol. 5 (2007) e277. L. Appelbaum et al., Proc. Natl. Acad. Sci. USA 106 (2009) 21942–21947. S. Higashijima et al., J. Comp. Neurol. 480 (2004) 1–18. J. Kaslin, P. Panula, J. Comp. Neurol. 440 (2001) 342–377. E. Kastenhuber et al., J. Comp. Neurol. 518 (2010) 439–458. K. Yamamoto et al., Mol. Cell Neurosci. 43 (2010) 394–402. T. Mueller et al., Brain Res. 1011 (2004) 156–169.

[23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36]

[37] [38] [39] [40] [41] [42]

[43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] [64]

303

C. Renier et al., Pharmacogenet. Genomics 17 (2007) 237–253. S. Guo, Expert Opin. Drug Discov. 4 (2009) 715–726. J.R. Berman et al., J. Comp. Neurol. 517 (2009) 695–710. S.D. Pelkowski et al., Behav. Brain Res. 223 (2011) 135–144. H. Dankert et al., Nat. Methods 6 (2009) 297–303. S.A. Budick, D.M. O’Malley, J. Exp. Biol. 203 (2000) 2565–2579. U.K. Muller, J.L. van Leeuwen, J. Exp. Biol. 207 (2004) 853–868. M.B. McElligott, D.M. O’Malley, Brain Behav. Evol. 66 (2005) 177–196. E. Fontaine et al., J. Exp. Biol. 211 (2008) 1305–1316. K. Fero et al., Neuromethods 52 (2011) 249–291. D.A. Prober et al., J. Neurosci. 26 (2006) 13400–13410. C.L. Cario et al., J. Physiol. 589 (2011) 3703–3708. D. Kokel et al., Nat. Chem. Biol. 6 (2010) 231–237. P. Martineau, High-throughput computational analysis of zebrafish behavior. Pierre and Marie Curie, Paris VI. PhD Thesis. Retrieved from , 2012. R. Creton, Behav. Brain Res. 203 (2009) 127–136. H.A. Burgess, M. Granato, J. Exp. Biol. 210 (2007) 2526–2539. M. Sonka et al., Image Processing, Analysis, and Machine Vision, Thompson Learning, 2008. H. Maitre, Image Processing, ISTE Wiley, 2008. K. Branson et al., Nat. Methods 6 (2009) 451–457. A.E. Nezhinsky, F.J. Verbeek, in: Pattern recognition for high throughput zebrafish imaging using genetic algorithm optimization, Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics, Springer-Verlag, 2010, pp. 301–312. C. Jiang et al., Contour Tracking based on Intelligent Scissors and Snakes, Computer Science Department University of California, Los Angeles, 2010. F. de Chaumont et al., Nat. Methods 9 (2012) 690–696. M.A. Luengo-Oroz et al., IEEE Trans. Image Process. 21 (2012) 2335–2340. A. Dufour et al., IEEE Trans. Image Process. 20 (2011) 1925–1937. W.J. Stewart et al., J. Exp. Biol. 216 (2013) 388–398. P. Soille, Morphological Image Analysis: Principles and Applications, SpringerVerlag, 2003. P. McClenahan et al., PloS One 7 (2012) e32295. J. Cachat et al., Nat. Protoc. 5 (2010) 1786–1799. J. Cachat et al., PloS One 6 (2011) e17597. A.M. Fernandes et al., Curr. Biol. 22 (2012) 2042–2047. G. Li et al., J. Exp. Biol. 215 (2012) 4015–4033. I.H. Bianco et al., Front. Syst. Neurosci. 5 (2011) 101. J. Delcourt et al., in: Video multitracking of fish behaviour: a synthesis and future perspectives, Fish and Fisheries 14 (2012) 186–204. S. Kato et al., J. Neurosci. Methods 134 (2004) 1–7. J. Delcourt et al., Behav. Res. Methods 41 (2009) 228–235. H.A. Burgess, M. Granato, J. Neurosci. 27 (2007) 4984–4994. R.E. Kalman, J. Basic Eng. 82 (1960) 35–45. Y. Bar-Shalom, T.E. Fortmann, Tracking and Data Association, Academic Press, 1988. S.J. Julier, J.K. Uhlmann, SPIE Proc. 3068 (1997) 182–193. M. Isard, A. Blake, Int. J. Computer Vision 29 (1998) 5–28. Z. Khan et al., IEEE Trans. Pattern Anal. Mach. Intell. 28 (2006) 1960–1972. A.P. French, Visual Tracking: From An Individual To Groups Of Animals. pp. 235, University of Nottingham. PhD Thesis. Retrieved from , 2006.