ThermalTracker-3D: A thermal stereo vision system for quanitfying bird and bat activity at offshore wind energy sites

ThermalTracker-3D: A thermal stereo vision system for quanitfying bird and bat activity at offshore wind energy sites

Journal Pre-proof ThermalTracker-3D: A thermal stereo vision system for quanitfying bird and bat activity at offshore wind energy sites Shari Matzner...

551KB Sizes 0 Downloads 8 Views

Journal Pre-proof ThermalTracker-3D: A thermal stereo vision system for quanitfying bird and bat activity at offshore wind energy sites

Shari Matzner, Thomas Warfel, Ryan Hull PII:

S1574-9541(20)30019-4

DOI:

https://doi.org/10.1016/j.ecoinf.2020.101069

Reference:

ECOINF 101069

To appear in:

Ecological Informatics

Received date:

16 December 2019

Revised date:

12 February 2020

Accepted date:

12 February 2020

Please cite this article as: S. Matzner, T. Warfel and R. Hull, ThermalTracker-3D: A thermal stereo vision system for quanitfying bird and bat activity at offshore wind energy sites, Ecological Informatics(2020), https://doi.org/10.1016/j.ecoinf.2020.101069

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier.

Journal Pre-proof

ThermalTracker-3D: A thermal stereo vision system for quanitfying bird and bat activity at offshore wind energy sites Shari Matzner* [email protected], Thomas Warfel, Ryan Hull Pacific Northwest National Laboratory *

Corresponding author. Abstract

We present a new, efficient method for extracting three-dimensional animal motion trajectories

of

from thermal stereo video data. Understanding animal behavior in the wild or other unconstrained environments is often based on animal movements. The technology described here is for

ro

understanding how bird and bat behavior is affected by the presence of wind turbines, specifically

-p

offshore wind turbines which are challenging to monitor. There is a need for both baseline data prior to wind farm construction and post-construction data when the turbines are operating.

re

Thermal cameras were chosen because they are equally effective both night and day. In previous work, we developed a method for generating two-dimensional images of animal motion using

lP

thermal video from a single camera. The motion track image is formed by combining a sequence of video frames into a single composite image that shows the entire flight trajectory. Here we

na

demonstrate that the composite motion track images from a stereo pair of thermal cameras can be used directly to generate three-dimensional tracks in real time without the need for an explicit

ur

tracking algorithm. The method was evaluated using an unmanned aerial system (UAS) equipped with GPS. The UAS flew both straight and curving trajectories at distances between 50 and 325

Jo

meters from the camera system. The ThermalTracker-3D estimated positions were within  10 meters of the GPS-derived positions in the x and y (flight height) dimensions, and within  20 meters in the z (range) dimension for 90% of the data points.The range estimates were within the bounds of the achievable accuracy of the cameras and camera arrangement used. The results demonstrate the practical usefulness of the method for assessing collision risk to seabirds at proposed offshore wind energy sites and for quantifying avoidance behavior at operating offshore wind farms.

1

Introduction

The motivation for this work is to develop technology for observing the fine-grained movements

Journal Pre-proof of birds and bats around offshore wind energy sites. The ability to observe their behavior will improve our understanding of the effects of wind energy developme nt on these animals. Potential effects are primarily collision with blades and displacement [9]. During the siting and permitting stages of a wind energy project, the magnitude of these effects must be estimated. Several different collision risk models [14] and displacement models [4] have been developed for this purpose. These models require species- and site-specific information – passage rates, avoidance rates, flight height and flight speed – that is not currently available, especially for US offshore locations. The models themselves have not been rigorously validated due to a lack of empirical data [6]. Yet the

of

modeling plays a critical role in siting decisions and determining whether a wind energy project

ro

goes forward [1].

An important risk model factor that is difficult to quantify is avoidance behavior [16].

-p

Three types of avoidance are defined to describe behavior at different scales. Micro-scale is last- minute avoidance of a blade, meso-scale is avoidance of individual turbines within a wind

re

farm, and macro-scale is avoidance of the farm altogether [6]. Collision risk is generally modeled

lP

based on the number of animals flying through the rotor-swept-zone (RSZ) 1 , the time an animal takes to transit through the zone and the probability of collision within the zone. Collision risk models incoporate micro- and meso-avoidance rates to reflect that some animals will avoid

by the development.

na

collision [5]. Displacement models are primarily based on macro-avoidance of the space occupied

ur

Current methods for collecting the data needed for model-based risk assessment and model

Jo

validation are limited. Collecting data for risk assessment prior to development is especially challenging due to a lack of infrastructure for remote sensing. Ship-based and aerial surveys can provide information on birds present in an area, including their species and their flight height, but the observations are limited to daylight hours and good visibility. A complementary method is needed to form a more complete picture of flying fauna activity at a particular site, which may include bats, nocturnally migrating birds and nocturnal seabirds. Current options for continuous monitoring of birds and bats at an operating offshore wind farm can be categorized as camera-based and radar-based [8]. Radar is best suited for macro-scale monitoring, due to its long range and lower resolution. Small animals may not be reliably detected 1

The rotor-swept-zone is a spherical volu me of space that the blades of the wind turbine can occupy. It is spherical

because the nacelle that houses the turbine rotates to orient the blades into the wind.

Journal Pre-proof and it is very difficult to infer species from radar data alone. Camera-based systems are best for collecting micro- to meso-scale data and for identifying species, although this is still an ongoing area of research. Several camera-based systems are currently available for use offshore; three include thermal imaging for nocturnal observations and two of these could potentially be used for collecting passage rates in the RSZ [8]. However, these systems were designed primarily for collision detection and would need further development in order to provide passage rate and avoidance data needed for collision risk model validation. To fill the need for site-specific detailed passage rates and avoidance data, we propose a

of

novel thermal stereo vision technology, ThermalTracker-3D, for quantifying the flight activity of

ro

birds and bats at remote locations. In previous work, we developed a method for automatically detecting birds and bats and extracting their flight tracks from thermal video [15]. The method

-p

generated two-dimesional flight track information from the video stream of a single camera, which could be used to study patterns of animal activity. The limitation of the method was mainly a lack

re

of three-dimensional data which is needed for collsion risk modeling and better species

lP

identification. In this work, we’ve added a second camera and apply the previously developed method to the video stream from each camera, in parallel. Then the two sets of extracted two-dimensional tracks are combined using stereo vision processing to generate three-dimensional

na

flight tracks. The flight trajectories provide passage rates through the RSZ and can be used to help identify animal species [7]. Other identifying features such as wingspan and body length can be

ur

extracted from the stereo data. The ThermalTracker-3D software automatically extracts this

Jo

information in real-time so that the volume of recorded data is reduced to support long-term collection, and so that up-to-date information is continually available. The key contributions of this research are: 

A novel technique for capturing the three-dimensional trajectories of flying objects in real-time using thermal stereo vision.



A solution for quantifying bird and bat activity around remote wind energy sites at offshore locations.

1.1

Related Work

The state of the art for 3D flying object tracking are the methods developed by Betke and Wu [20], [18], [21]. The objective of Betke and Wu was to simultaneously track very large numbers

Journal Pre-proof (hundreds to thousands) of bats (less than 30 cm in length) with enough accuracy to answer questions about the population size and about flight behavior from short (on the order of minutes) thermal video recordings of bat emergence events. Their multiple camera view method performed the detection and two-dimensional tracking stages individually for each camera and then performed stereoscopic reconstruction and epipo lar neighborhood gating for efficient data association across all camera views. Object detection was accomplished with a pixel-based Gaussian intensity model to identify foreground pixels. Groups of connected pixels were then classified as objects of interest (bats, in this case). The tracking algorithm was a variation of a

of

Kalman filter, the alpha-beta tracker. Objects were assigned to tracks (the data assocation step)

ro

using probabilistic gating and a greedy algorithm that favored objects with long histories. Tracking can be challenging when tracked objects are temporarily hidden from view, i.e., occluded by other

-p

objects in the scene. The multiple view method was found to better resolve occlusions arising from the density of animals than the single view (camera) method but increased the computational

re

complexity so that processing each frame required three seconds. This means one minute of video

lP

recorded at 125 fps (as reported in the paper) required over six hours to process. That processing time could likely be significantly reduced with current computer technology, but this multiple object tracking method would still be a post-processing approach.

na

Our application differs from censusing bats in the following ways. Our application is long-term recording at remote locations (e.g., offshore) and there may be long periods of time with

ur

no animals present. This means that the raw video stream must be processed at the remote site in

Jo

near real- time to detect when animals are present to optimize data storage. At offs hore locations, the majority of the observed animals will be birds which exhibit different flight behavior than bats. Bats are also a targeted species for observation, but they will not be emerging from a cave and so will not be in dense groups like the bats studied by Betke. Although multiple birds or bats may occur simultaneously at a wind energy site, the required position accuracy for determining whether animals are at risk from wind turbines is on the order of meters. In summary, computational efficiency is more critical for remote monitoring applications than precise position accuracy. In light of these differences, we have developed a novel thermal stereo vision approach that preforms real-time detection to reduce the amount of stored data, and stereo processing that trades high accuracy for computational efficiency. A stereo vision system was developed by Huang et al for studying migrating birds in urban

Journal Pre-proof environments ([13]). The system used high sensitivity visible spectrum cameras to record during the day and at night, exploiting the light pollution from an urban environment. A wide two-meter baseline was used for high depth accuracy. The processing was done on the intensity images; color was not used. Birds appear light against the night sky because of the illumination from below, so the data are similar to thermal video. The stereo point matching was challenging due to the small size of many of the birds, their distance from the cameras (200 to 2500 meters) and the low signal-to-noise ratio of nighttime images. Similarly to our method, the Huang method first applies background subtraction to each video stream in the stereo pair. The foreground objects from each

of

video stream are then formed into tracks by selecting two points from one image and one point

ro

from the other image at random, and fitting the points to a 5D bird flight model (2D initial location, 2D motion vector, 1D disparity) where the model assumes straight line flight. The Random

-p

Sample Consensus (RANSAC) algorithm is used to find the best model. Finally a deformable parts model is used to verify 3D trajectories so that false positives are eliminated. A moving window of

re

5 seconds (150 frames) is processed. The method was evaluated on one 40- minute video annotated

lP

by experts that contained 86 birds. The results were precision 97.30% and recall 83.72% for detecting the birds. There was no ground truth for flight height, so the accuracy of the stereo vision processing was not quantified. There was no attempt made for species identification. The

na

limitations of the method are the inability to handle sudden illumination changes or dynamic backgrounds such as moving clouds, and the dependence on urban light pollution for illumination.

ur

The assumption of straight line flight is valid for migrating birds, but is not valid for bats or

Jo

foraging birds that may be found at wind energy sites. Recently, a high-definition visible spectrum stereo camera system was developed specifically for montioring birds and bats around wind energy sites[2]. The system used high-definition cameras with fish-eye lenses for a wide field of view. The system used near-infrared illumination for low light and nighttime recording. A novel on-board processing algorithm detected motion in real-time and used the motion information to reduce the volume of data stored for post-processing. The post-processing consisted of reconstructing the data and applying stereo vision to extract 3D flight tracks and was still a work in progress at the time of the report. There was some field testing of the system, which included determining that the near-infrared system was not effective beyond the immediate vicinity of the camera. To address the limitations of the methods described above for quantifying bird and bat

Journal Pre-proof flight behavior, we propose the ThermalTracker-3D technology. The remainder of this paper is organized as follows: Section 2 gives the details of the new stereo vision processing, Section 3 presents the results of testing the method with an unmanned aerial system (UAS), and final conclusions are given in Section 4.

2

Thermal Stereo Vision in Real-Time

The proposed thermal stereo vision approach processes the video streams from a stereo pair of

of

thermal cameras in near real-time. The video stream from each camera is processed in parallel to form motion track images. Stereo vision processing is then applied to each pair of motion track

ro

images, rather than each individual video frame. The result is a highly efficient 3D motion track

-p

extraction method that runs in real time.

The real- time processing pipeline (Figure 1) consists of 1) image acquisition, 2)

re

background modeling, 3) enhanced video peak store, 4) motion track extraction, 5) stereo processing, and 6) 3D feature extraction. Each of these steps is described in detail in the following

lP

sections.

Image Acquisition and Preprocessing

ur

2.1

na

Figure 1: ThermalTracker-3D Processing Pipeline

Jo

The image acquisition and preprocessig stage acquires images from each camera and applies preprocessing to optimize the quality of the images for subsequent processing. The amount and type of preprocessing applied is somewhat dependent on the particular cameras used. The preprocessing described here is appropriate for any thermal camera; additional camera-specific processing could also be introduced at this stage.

2.1.1

Synchronization

Stereo images must be acquired at precisely the same time from each camera, especially when the objective is to locate moving targets, as it is here. Some thermal cameras provide a hardware synchronization mechanism and this is a good option. For cameras that do not, we developed a synchronization method where one camera (the “fast” camera) is operatated at twice the frame rate

Journal Pre-proof as the actual desired frame rate. For each acquisition, the image from the fast camera that has an acquistion time closest to that of the image from the other camera is selected and others are discarded. Even for cameras that have hardware synchronization, we check the acquisition times of the images from both cameras to ensure they are within a tolerance.

2.1.2

Bad Pixel Correction

A thermal camera has an array of individual detectors that generate the intensity at each pixel in an image. Due to limitations in materials and manufacturing processes, the detectors in the array may

of

have slightly different responses to thermal energy and the response may vary over time. To

ro

address this, we examine each incoming image for bad pixels and apply a correction. The correction process is based on the assumption that bad pixels are isolated, i.e. single pixels

-p

surrounded by good pixels. To detect bad pixels, a new image is created where each pixel is given the value of the distance-weighted average of its eight neighbors by convolving the original image

re

with the following kernel:

lP

0.1035 0.1464 0.1035 0.146 0 0.146

na

0.1035 0.1464 0.1035 The original image is subtracted from the new image, and the absolute value of the

ur

difference of each pixel is compared to a threshold. Any pixels above the threshold are assumed to be either too bright or too dark relative to what’s expected based on the eight-neighbor average,

Jo

and those suspect values are replaced by the eight-neighbor average,   I i , j , if | I i , j  I i , j |  I i, j =  otherwise   Ii, j ,

(1)

We used a threshold of 256 for cameras with 16-bit unsigned integer intensity values.

2.1.3

Intensity Normalization

Intensity normalization preserves a consistent dynamic range under variable temperature and visibility conditions. The intensity normalization is performed by transforming pixel values from raw intensity to a z-score,

I i, j =

I i, j  



(2)

Journal Pre-proof where the mean  and standard deviation  are calculated from a sliding window of 32 frames (about one second of data). A single mean and stard deviation are estimated for overall image intensity, not for each pixel location. Note that the statistics are computed separately for the left and right video streams.

2.2

Background Modeling

We define the background of a thermal camera’s field of view to be everything that is not a bird or

of

a bat. We assume that animals will be relatively warm and will therefore appear brighter than the background in the thermal imagery. We also assume that the background is stationary (for the most

characterized by a mean and a standard deviation.

ro

part, see below) and that the intensity at each pixel location follows a normal distribution

-p

The background is estimated by calculating statistics for each pixel location in the

re

corrected and normalized image streams from each camera. For each pixel, a short-term, a medium-term and a long-term mean and standard deviation are calculated. The short-term

lP

statistics are estimated from the last 24 frames (about one second), the medium- term from the last 24 x 8 = 192 frames (about 8 seconds), and the long-term from the last 192 x 48 = 9216 frames

Short-term mean:

na

(about 5 minutes).

ur

i , j (k ) =

1 k  Ii, j (n) 24 n = k 23

(3)

1 k  i, j (n) 8 n = k 7

(4)

1 k  i, j (n) 48 n = k 47

(5)

Jo

Medium-term mean:

i, j (k ) =

Long-term mean:

i, j (k ) =

The short-term, medium-term and long-term standard deviations–  ,   and   – are calculated similarly. The assumption that the background is stationary is not always valid for outdoor scenes that may contain wind-blown vegetation, drifting clouds, and waves. Background image noise at each pixel (“known motion”) is estimated by performing a Gaussian blur on the long-term

Journal Pre-proof standard deviation estimate:

 (k ) = (k ) 

(6)

where   is the long-term standard deviation matrix and  is a 7 x 7 Gaussian kernel.

2.3

Enhanced Video Peak Store

This stage in the processing generates the composite peak value images and a mask indicating those pixels in the composite image that potentially correspond to a bird or bat. This stage is

of

performed in parallel on the preprocessed image stream from each camera. The process is called Enhanced Video Peak Store because the composite images are formed by setting the intensity at

ro

each pixel in the composite image to the peak value observed at that pixel location over time,

Pi , j (k0 ) = max Ii, j (k )

(7)

-p

k0  k < k0  K

re

where k0 is the first frame in the peak value image and K is the number of frames used to form the image. The length of time used to form the composite images is generally set to 10-12 seconds

lP

(250 - 360 frames, depending on frame rate). This is a few seconds longer than the length of time it would take a bird flying at 6 m/s (13.4 mph) to cross the camera’s field of view at a range of 150

na

meters for a camera with a 25 deg. field of view. The windows are overlapped by 50% so that when an animal flies through the field of view, at least one composite peak value image will contain the

ur

entire motion track.

The peak value image is compared to the background model to generate a mask that is used

Jo

in the next stage to extract tracks from the peak value image. A channel is added to the peak value image that contains the frame indices at which the peak value of the corresponding pixel occured. This temporal information is used by the next stage for extracting motion tracks. The mask indicates pixel values that are brighter than the local average and is generated as follows. 1, if  i , j (k ) >  (k ) M i , j (k ) =  0, otherwise.

2 5

 (k ) = min (k )  (max (k )  min (k )) i, j

i, j

(8) (9)

i, j

where  (k ) is the difference between the current pixel value and the short-term average (Eq. 3) plus the blurred long-term standard deviation (Eq. 6):

Journal Pre-proof i , j (k ) = Ii, j (k )  (i , j (k )   (k ) i , j (k ))

(10)

and  is a dynamic scaling factor:

 (k ) =

 

i, j

i , j

i, j

i, j

(11)

  (k ) /  i , j (k ) if i,j (k)>0, i , j ( k ) =  i , j

(12)

1 if  i,j (k) > 0and i,j (k) > 0, 0 otherwise.

(13)

0

otherwise.

of

i, j (k ) = 

The dynamic scaling and thresholding dampens the effects of sudden gain changes in the

-p

2.4

ro

scene.

Motion Track Extraction

re

The motion track extraction stage segments a peak value image into foreground blobs that

lP

correspond to a bird or bat, and then connects the blobs into tracks. (This is essentially the same as the algorithms described in [15], Section 2.2.2.) Blobs – groups of connected pixels – are formed

na

starting with the pixels in the mask generated in the Enhanced Video Peak Store stage. A pixel with a value of 1 in the mask is selected as a seed to grow a new blob. All the neighboring pixels (in

ur

an 8-connected neighborhood) of the seed pixel who’s peak value occurred in the same frame as the seed pixel are added to the blob. The blob continues to grow as neighboring pixels of the newly

Jo

added pixels with the same peak value frame index are added, until there are no more neighboring pixels to add. Note that this process is using only temporal information – the peak value frame index – and not intensity. A blob is represented as a data structure that contains the list of pixels, their intensities, the frame index, and the bounding box. Once all the pixels from the mask have been assigned to blobs, the blobs are assigned to tracks. The list of blobs is sorted from earliest to latest based on their frame indices. The earliest blob starts a new track. The track algorithm grows the track by adding the spatially nearest blob and enforces that the frame indices monotonically increase over the spacial extent of the track. If more than one blob is near the last assigned blob to the track, then the size and intensity of the blobs are used to determine which is most similar to the blobs already assigned. This continues until all blobs are assigned to tracks. A list of active tracks is maintained across successive peak

Journal Pre-proof value images. A track is considered complete when no new blobs have been added over a certain number of frames. Tracks are stored as ordered lists of blobs.

2.5

Stereo Processing

Stereo vision processing consists of matching points from one image of a stereo pair with points in the other image and using epipolar geometry to find the real world location of t he object corresponding to the matched pair of image points [11]. A calibration procedure is used to estimate the parameters of the stereo vision system. The accuracy of the 3D position estimation is directly

of

influenced by the quality of the calibration and is also influenced by the accuracy and precision of

Calibration

-p

2.5.1

ro

the point matching.

re

The calibration procedure is performed independently of the real-time processing pipeline. The calibration estimates the intrinsic and extrinsic parameters of the stereo pair of cameras. The

lP

intrinsic parameters describe the individual camera characteristics – focal length, resolution and lens distortion. The extrinisic parameters describe the geometric transformation from one camera’s

na

coordinate system to the other in the real world coordinate frame – rotation and translation in three dimensions. Together, these parameters are used to calculate an object’s location in world space

Point Matching

Jo

2.5.2

ur

from the object’s location in each camera’s image.

Our point matching algorithm operates on the track data extracted from each camera. This is the key innovation of the proposed stereo vision method: The point matching, which can be challenging in thermal imagery due to its low resolution, is performed based on the assumption that the centroids of a stereo pair of blobs correspond to the same real-world point on the object associated with the blobs. The centroid of a blob is cx =



i , jB

jIi, j (k ) s

, cy =



i , jB

iIi, j (k )

s

where B is the set of pixels in the blob, k is the frame index of the blob and s=

 I  (k ).

i , jB

i, j

(14)

Journal Pre-proof Note that the centroid is in image plane coordinates, not pixel indices. The image plane coordinates are continuous unlike the pixel indices which are discrete. Thus using the centroids gives subpixel precision for the point matching. First, a track from one camera is matched to a track from the other camera that occured at the same time. In the case of multiple tracks occurring at the same time, the vertical position of the tracks in the image is used to match corresponding tracks from each camera. The centroids of each pair of co-occurring blobs from the two cameras’ tracks are assumed to correspond to the same point on the animal. This simplifying assumption has two benefits: 1) the complexity of this point

of

matching approach is significantly lower than the methods used by Wu et al [21] and by Huang et

ro

al [13] and can be executed in real-time, and 2) the use of the centroid provides sub-pixel precision which increases the accuracy of the subsequent 3D location estimation. This point- matching and

-p

triangulation processing generates an intial, noisy sequence of estimated 3D coordinates for each

Track Refinement

lP

2.5.3

re

blob in the track, [ Xˆ , Yˆ , Zˆ ] .

The intial sequence of 3D coordinates is futher refined by enforcing that over the course of the

na

track, the distance between the animal and the camera system is varying smoothly. The validity of this assumption depends on the frame rate of the video stream and the speed at which an animal is

Jo

a function of time,

ur

traveling. Using the initial sequence of 3D positions, the distance from the camera is calculated as

dˆ (tk ) = Xˆ (tk )2  Yˆ (tk )2  Zˆ (tk ) 2 ,

(15)

where tk is the time when the frame k containing the blob was recorded. The first derivative of the function is approximated by differencing at each point, dˆ (tk )  dˆ (tk 1 ) dˆ (tk ) = . tk  tk 1

(16)

Any points where the derivative is more than four standard deviations above the mean are considered outliers. The distance at those points is replaced with the linear interpolation of the values before and after it,   L(dˆ (tk 1 , dˆ (tk 1 ), if dˆ (tk ) >   4 dˆ (tk ) =  , ˆ otherwise  d (tk ),

(17)

Journal Pre-proof where  and  are the mean and standard deviation of dˆ and L() is linear interpolation. Then the entire distance sequence is smoothed with a low-pass filter. Finally, refined coordinates are generated as

Xˆ  = dˆ  sin 

(18)

Yˆ  = dˆ  sin 

(19)

Zˆ  = dˆ  cos  cos 

(20)

3D Feature Extraction

-p

2.6

ro

index in the above equation was ommitted for readability.

of

where dˆ  is the smoothed distance and  = arcsin Xˆ / d and  = arcsin Yˆ / d . The time

For each track, the timestamps and 3D coordinates of the animal positions along the track are used

re

to calculate features that can be used to identify the species of the animal. For example, the shape of the flight track – straight line, swooping, etc. – can be used to discriminate between certain

lP

species [7]. The size and speed of a tracked object can be estimated and used to identify species

Evaluation Using A Controlled Target

ur

3

na

and also to filter out objects like aircraft and boats.

The ThermalTracker-3D algorithms were implemented in C++ using the OpenCV [3] and Eigen

Jo

[10] libraries and was built for the Ubuntu operating system. The software was then evaluated using an unmanned aerial system (UAS) as a controlled target. The testing took place at the National Renewable Energy Laboratory’s Flatiron campus, also known as the National Wind Technology Center (NWTC). The system set up for the evaluation consisted of a stereo pair of Flir A65 thermal cameras (Table 1) mounted on a custom platform designed for the test campaign and a System 76 Oryx Pro laptop (Table 2) running the ThermalTracker-3D software (Figure 2a).

3.1

System Calibration

A calibration process was developed that used a pattern of circles in a grid on a hand-held board, where the circles and the background of the board had different thermal properties. The calibration

Journal Pre-proof pattern used for this test campaign was constructed from plywood covered in radar absorbent material with circular foil stickers placed in a 3 x 4 grid pattern (Figure 2b). The calibration process consisted of holding up the pattern within the cameras’ fields of view, at different positions and tilted at different angles. At each pattern position, the images from each camera were stored to form a set of calibration image pairs. Generally, 12 to 15 or more pairs we re recorded at distances between 10 and 20 meters from the cameras. The stereo calibration routines in OpenCV were used to process the image pairs and generate the intrinsic and extrinsic parameters needed for

3.2

ro

once the system had been set up and again post recording.

of

stereo vision. The calibration process was performed just prior to every video recording session,

Controlled Target Flights

-p

The UAS was a FireFly6 Pro (BirdsEyeView Aerobotics) operated by Red Mountain Scientific

re

(Fort Collins, CO). The UAS was a fixed-wing aircraft approximately the size of a large bird (Table 3) outfitted with a real-time kinetic (RTK) positioning system. RTK works with GPS to

lP

provide more accurate positioning than GPS alone, with accuracy generally within a few centimeters. However, the GPS-RTK system suffered from unfavorable atmospheric conditions at

na

times and interference at the test site so not all the reported GPS positions were RTK-corrected.

Jo

ur

Table 1: Camera Specifications Model: FLIR A65

Detector: uncooled microbolometer

Spectral range: 7.5 - 13 µm Pixel pitch: 0.17 µm Resolution: 640 x 512 Focal length: 25 mm Field of view: 24.6 x 18.5 deg.

Table 2: Computer Specifications Model: System 76 Oryx Pro laptop Processor: i7-7820HK (2.9 up to 3.9 GHz; 8 MB Cache 4 Cores; 8 Threads) Memory: 32 GB Dual-channel DDR4 at 2400 MHz (2x 16 GB)

Journal Pre-proof Storage: 1 TB M.2 SSD

Figure 2: Equipment used for recording controlled target data.

Table 3: UAS Specifications Model: FireFly6 Pro Style: fixed wing with articulated props Size: wingspan 1.5 m, length 0.8 m

Minimum speed: 6 m/s (13.4 mph)

ro

Maximum speed: 18 m/s (40.2 mph)

of

Outer material: EPO foam

-p

Maximum operational wind speed: steady 8 m/s (18 mphh), gusts up to 12 m/s (27 mph)

re

The flight tests consisted of the UAS flying continuously for about 15 - 35 minutes at a time, moving through the camera system’s field of view in various patterns mimicing bird flight.

lP

The patterns included straight line flight, spiraling upwards like a soaring raptor and swooping like a seabird. The position of the UAS was recorded at 10 Hz throughout the duration of each flight.

na

The UAS in flight was generally between 50 and 300 meters from the camera system. For context, the average lower and upper bounds of the RSZ of installed offshore wind turbines in 2018 were

ur

37.5 meters and 178.5 meters 2 . The largest offshore wind turbine, the GE Haliade-X 12 MW

3.3

Jo

turbine, will have a RSZ between 40 meters and 260 meters, assuming a hub height of 150 meters.

Data Processing

In order to compare the ThermalTracker-3D output to the UAS position data, some processing was required. First, the times when the UAS was in the camera system’s field of view was determined. Then the UAS position data and ThermalTracker-3D position estimates were transformed into a common coordinate frame. Finally, the UAS position data was interpolated to get position estimates that coincided with the times of the ThermalTracker-3D position data. The GPS coordinates from the UAS were in degrees latitude, longitude and altitude 2

The Fraunhofer Institute for Energy Economics and Energy System Technology, retrieved from

http://windmonitor.iee.fraunhofer.de/windmonitor_en/3_Onshore/2_technik/4_anlagengroesse on Feb. 6, 2020.

Journal Pre-proof relative to the center of the earth. The ThermalTracker-3D output was in Cartesian coordinates relative to the center of the image plane of the left camera. Using the camera position, bearing and elevation data that was recorded at the start and end of each UAS flight session, both the UAS and ThermalTracker-3D positions were transformed into a common Cartesian East-North-Up reference frame centered on the surface of the earth (Figure 3). In this coordinate frame, the Y dimension is the flight height and the Z dimension is the ground range of the target.

Figure 3: The ThermalTacker-3D position data and the UAS position data were transformed into a

ro

of

common coordinate frame.

The resulting dataset was comprised of 191 flight tracks through the camera system’s field

-p

of view with a total of 20,440 position data points. Each postion point was a vector [i, t , X , Y , Z , Xˆ , Yˆ , Zˆ ] where i is the track id, t is the time the position was calculated for;

re

X , Y , Z are the GPS-derived positions, as descibed in the preceding paragraph; and Xˆ , Yˆ , Zˆ are

lP

the estimated positions from the ThermalTracker software. The points sample most of the camera system’s field of view (Figure 4). About 70% of the data points were greater than 150 meters away

na

in the Z dimension (Figure 5).

Jo

field of view.

ur

Figure 4: The spatial distribution of the UAS position data covers most of the camera system’s

Figure 5: Distribution of the range ( Z dimension, in meters) of the UAS position data.

Figure 6: Example UAS flight track recorded by ThermalTracker-3D.

3.4

Results and Discussion

For each point the GPS-derived and ThermalTracker-3D estimated distance was calculated, d = X 2  Y 2  Z 2 , dˆ = Xˆ 2  Yˆ 2  Zˆ 2 .

(21)

The GPS-derived distances ranged out to 337 meters with an average of 186.5 meters. The distance

Journal Pre-proof estimation error ed = dˆ  d

was calculated to summarize the overall accuracy of the

ThermalTracker-3D position estimates (Figure 7). The median error was 4.082 meters, and the first and third quartiles were -5.881 and 13.905 meters, respectively.

Figure 7: GPS-derived vs. Estimated Distance The position error was calculated for each dimension, ex = Xˆ  X , ey = Yˆ  Y and

of

ez = Zˆ  Z . Overall, the estimated positions agreed well with the GPS-derived positions (Table 4). There were several points with error values that were outliers in at least one dimension, where an

ro

outlier is defined as a value greater than 1.5 times the interquartile range. The data points

-p

associated with the outlier error values were labeled as “bad” points. There were 1639 bad points (8% of the total points). Of the 191 flight tracks, 101 (53%) had no bad points and 137 (72%) had

lP

re

10 or less bad points (Figure 8). The average track length was 107 points.

Table 4: Estimation Error Statistics 25th

Percentile

Percentile

Yˆ  Y

-0.300

Zˆ  Z

-17.237

dˆ  d

-15.536

Mean

75th

90th

Percentile

Percentile

-5.389

-2.052

-1.273

2.241

6.061

1.001

3.194

4.302

6.826

10.675

-6.453

2.969

2.214

11.564

19.863

-5.265

3.720

3.446

12.700

21.333

ur

-7.302

Jo

Xˆ  X

Median

na

10th

Figure 8: ThermalTracker-3D Extracted UAS Flight Tracks (in chronological order)

The ThermalTracker estimated positions showed very little bias (Figures 9 - 11). The uncertainty in the estimated flight height increased with distance (Figure 12) because the size in pixels of the traget gets smaller which, in turn, reduces the accuracy of the point matching. The mean error in each dimension remains close to zero but the uncertainty (spread) increases with distance from the camera system. This also means that the position estimate for a smaller target would have more uncertainty than the estimate for a larger target at the same distance. For

Journal Pre-proof example, at 200 meters from the cameras, a bird with a 1 meter wingspan would occupy a little over 5 pixels in each camera’s image. A bird with a 0.6 meter wingspan (e.g., Common Murre) would occupy the same number of pixels at 120 meters from the cameras. Thus the accuracy of the position estimate of the smaller bird when it was 120 meters away would be similar to that of the estimate for the larger bird at 200 meters.

Figure 10: GPS-derived vs. Estimated Y (Flight Height)

ro

of

Figure 11: GPS-derived vs. Estimated Z (Range)

Figure 12: The accuracy of the flight height Y estimates is range-dependent because the size in

Sources of Uncertainty

re

3.4.1

-p

pixels of the target gets smaller as range increases.

lP

The uncertainty in 3D position estimates from stereo vision processing, in general, is determined by the baseline separation between the cameras, the quality of the calibration and the accuracy of the point matching. The precision of the point matching is limited by the resolution and field of

na

view of the cameras. The GPS-derived positions also contain uncertainty. The RTK correction was not always available, as noted earlier, due to interference from other sensors in the area. When the

ur

GPS switched between RTK-corrected and uncorrected modes, a sudden jump in position would

Jo

occur (Figure 13). Uncorrected GPS positions generally have at worst 4 meter RMS horizontal accuracy. Vertical accuracy is worse.

Figure 13: An example flight track where changing GPS modes from RTK-corrected to uncorrected results in a discontinuity in the track.

For the 1- meter baseline separation and the cameras used here, an error in the point matching by a single pixel would cause an error in the estimated Z -dimension of 27.3 meters for an object 200 meters away. At a range of 200 meters, an object 1 meter in length (about the length of the UAS or a goose-sized bird), occupies 5.2 pixels in an image from the cameras used for this work, making matching a specific point on the object such as a wingtip or beak d ifficult. We were

Journal Pre-proof able to achieve subpixel precision for the point matching by using the center of mass of the matching thermal blobs calculated as a floating point number. The accuracy of the ThermalTracker-3D range estimates were within the theoretical limits of the prototype stereo camera system. The spatial distribution of the bad points in the system’s field of view was more dense towards the edges of the field of view (Figure 14). This may indicate that the calibration process did not fully characterize the lens distortion in those regions of the field of view. The bad points near the edges may also be due to the vignetting effect of the A65 cameras, where the images were

of

darker around the edges. This was corrected for during the image acquisition stage but the

ro

correction may have altered the center of mass of blobs near the edges and therefore reduced the

-p

accuracy of the point matching in those regions.

re

Figure 14: Spatial Distribution of Bad Data Points

lP

Some bad points (outliers) were due to self-occlusion meaning that blobs of a single target are occluded by subsequent blobs from the same target in the composite motion track image (Figure 15). Self-occlusion occurs when there is target motion towards or away from the camera

na

system. Self-occlusion can also occur if the target is moving slowly such that its motion between frames is less than its own length. As can be seen in the example (Figure 15), the flight is tracked

ur

correctly in the 2D image from each camera but the point matching step breaks down due to the

Jo

overlapping blobs. One potential solution for correcting this issue is to reduce the frame rate so that the blobs do not overlap. We processed the example flight track at half the frame rate and the accuracy was improved (Figure 16).

Figure 15: Flight Track with Self-Occlusion

Figure 16: An example flight track where reducing the frame rate from 30 fps (full) to 15 fps (half) improves the accuracy of the estimated positions along the track. The reduced frame rate reduces the amount of self-occlusion of consecutive blobs.

4

Conclusion

Journal Pre-proof We have presented ThermalTracker-3D, a technology for providing site-specific detailed passage rates and avoidance data for collision risk modeling and model validation at offshore wind energy sites, both proposed undeveloped sites and operating wind farms. The ThermalTracker-3D software extracts 3D flight tracks from a stereo thermal camera system in real-time. In previous work, the performance of the ThermalTracker detection algorithm was evaluated using video data of wild fauna – gulls, terns, swallows, bats – recorded in coastal settings under various weather conditions [15]. Here, the 3D tracking accuracy was evaluated using a GPS-equipped UAS as a controlled target. The ThermalTracker-3D estimated positions were within  10 meters of the

of

GPS-derived positions in the x and y (flight height) dimensions, and within  20 meters in the z

ro

dimension for 90% of the data points at distances between 50 and 300 meters. This level of accuracy is comparable to that of human observers [12] and is sufficient for characterizing bird and

-p

bat flight activity relative to the RSZ of offshore wind turbines. For example, the rotor diameter of the 6 MW turbines installed at the Block Island offshore project in the U.S. is 150 meters and the

re

diameter of the GE 12 MW Haliade-X is 220 meters.

lP

For remote offshore locations being considered for wind energy development, there is very little data on bird flight heights except that from ship-based and aerial surveys, which are generally limited to daylight hours and fair weather [17]. The proposed method has the capability to provide

na

a more comprehensive picture of all bird and bat activity for better collision risk assessment to inform wind energy development. Future work will include characterizing the performance of the

ur

system at an offshore location from a floating platform such as a buoy or moored barge. For

Jo

operation on a buoy, a mechanical stabilization system will be used to mitigate the effect of wave motion. The cameras will be positioned to look upwards at the sky, to maximize the area of the RSZ in the field of view and to eliminate the potential clutter of wind waves. The offshore evaluation will characterize the system performance over a wide range of weather conditions. Heavy precipitation and thick fog are expected to reduce efficacy. To what degree remains to be seen. For quanitifying avoidance behavior at operating wind energy sites, the proposed method could be further refined in two areas. First, the position accuracy could be improved to be closer to  5 meters in all dimensions (3% of the rotor diameter of 6 MW turbines, 1.9% of the rotor

diameter of 12 MW turbines). Second, the algorithms must be modified to handle moving turbine blades in the field of view. We believe that the estimated position accuracy of the current system

Journal Pre-proof can be improved by employing a more rigorous calibration process, such as that described in [19]. The calibration process used for the results reported here did not fully characterize the field of view of both cameras and, as a result, projected points outside the region of calibration are more uncertain. Adding a third camera with another perspective would also improve the accuracy at the cost of increasing the system complexity, but the trade-off may be justified. As to the second refinement, during the course of the testing we recorded thermal video data with moving tubine blades in the field of view so we now have a dataset suitable for making the ThermalTracker-3D algorithms robust to moving blades.

of

In conclusion, the ThermalTracker-3D system can be used to quantify the flight activity of

ro

birds and bats in terms of passage rates, flight height, flight speed and patterns of occurrence at remote locations. The data provided by the system can be used to inform collsion risk models for a

-p

proposed wind energy site. With further refinements, ThermalTracker-3D can also be used to quantify avoidance behavior and detect collisions at operating wind energy sites, providing much

lP

re

needed data to improve and validate current collision risk models.

Acknowledgements

na

The authors would like to thank Bethany Straw and Hristo Ivanov of the National Renewable Energy Laboratory for their invaluable suppport in making our test campaign a success. The

ur

authors would also like to thank the Red Mountain Scientific team for their professionalism and responsiveness.

Jo

This work was funded by the Wind Energy Technologies Office within the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy.

References [1]

Vineyard wind offshore wind energy project biological assessment: Final. Tech. rep., Bureau of Ocean Energy Management, 2019.

[2]

ADAMS, E., GOODALE, W., BURNS, S., DORR, C., DURON, M., GILBERT, A., MORATZ, R., AND ROBINSON,

M. Stereo-optic high definition imaging: A new technology to understand

bird and bat avoidance of wind turbines. Tech. rep., Biodiversity Research Institute, 2017. [3]

BRADSKI, G. The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).

Journal Pre-proof [4]

BUSCH, M., AND GARTHE, S. Approaching population thresholds in presence of uncertainty: Assessing displacement of seabirds from offshore wind farms. Environmental Impact Assessment Review 56 (2016), 31–42.

[5]

CHAMBERLAIN, D. E., REHFISCH, M. R., FOX, A. D., DESHOLM , M., AND ANTHONY, S. J. The effect of avoidance rates on bird mortality predictions made by wind turbine collision risk models. Ibis 148 (2006), 198–202.

[6]

COOK, A. S., HUMPHREYS, E. M., BENNET, F., MASDEN, E. A., AND BURTON, N. H. Quantifying avian avoidance of offshore wind turbines: current evidence and key

CULLINAN, V. I., MATZNER, S., AND DUBERSTEIN, C. A. Classification of birds and bats

ro

[7]

of

knowledge gaps. Marine environmental research 140 (2018), 278–288.

using flight tracks. Ecological Informatics 27 (2015), 55–63. DIRKSEN, S. Review of methods and techniques for field validation of collision rates and

-p

[8]

avoidance amongst birds and bats at offshore wind turbines. Report No. SjDE (2017),

FURNESS, R., WADE, H., AND MASDEN, E. Assessing vulnerability of marine bird

lP

[9]

re

17–01.

populations to offshore wind farms. Journal of Environmental Management 119 (2013), 56–66. cited By 25.

GUENNEBAUD, G., JACOB, B., ET AL. Eigen v3. http://eigen.tuxfamily.org, 2010.

[11]

HARTLEY, R., AND ZISSERMAN, A. Multiple view geometry in computer vision. Cambridge

na

[10]

HARWOOD, A. J., PERROW , M. R., AND BERRIDGE, R. J. Use of an optical rangefinder to

Jo

[12]

ur

university press, 2003.

assess the reliability of seabird flight heights from boat-based surveyors: implications for collision risk at offshore wind farms. Journal of Field Ornithology 89, 4 (2018), 372–383. [13]

HUANG, J., CARUANA , R., FARNSWORTH, A., K ELLING, S., AND AHUJA , N. Detecting migrating birds at night. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016), pp. 2091–2099.

[14]

MASDEN, E., AND COOK, A. Avian collision risk models for wind energy impact assessments. Environmental Impact Assessment Review 56 (2016), 43–49.

[15]

MATZNER, S., CULLINAN, V. I., AND DUBERSTEIN, C. A. Two-dimensional thermal video analysis of offshore bird and bat flight. Ecological informatics 30 (2015), 20–28.

[16]

SKOV, H., HEINÄNEN, S., NORMAN, T., WARD, R., and MÉNDEZ, S. ORJIP bird

Journal Pre-proof avoidance behaviour and collision impact monitoring at offshore wind farms. [17]

THAXTER, C. B., ROSS-SMITH, V. H., AND COOK, A. S. How high do birds fly? a review of current datasets and an appraisal of current methodologies for collecting flight height data. Tech. Rep. BTO Research Report No. 666, British Trust of Ornithology, 2015.

[18]

THERIAULT, D., WU, Z., HRISTOV, N., SWARTZ, S., BREUER, K., K UNZ, T., AND BETKE, M. Reconstruction and analysis of 3d trajectories of brazilian free-tailed bats in flight. Tech. rep., CS Department, Boston University, 2010.

[19]

THERIAULT, D. H., FULLER, N. W., JACKSON, B. E., BLUHM , E., EVANGELISTA , D., WU, Z.,

of

BETKE, M., AND HEDRICK, T. L. A protocol and calibration method for accurate

ro

multi-camera field videography. The Journal of experimental biology 217, 11 (2014), 1843–1848.

WU, Z., HRISTOV, N. I., HEDRICK, T. L., K UNZ, T. H., AND BETKE, M. Tracking a large

-p

[20]

number of objects from multiple views. In Computer Vision, 2009 IEEE 12th International

lP

WU, Z., THANGALI, A., SCLAROFF, S., AND BETKE, M. Coupling detection and data association for multiple object tracking. In Computer Vision and Pattern Recognition

ur

na

(CVPR), 2012 IEEE Conference on (2012), IEEE, pp. 1948–1955.

Jo

[21]

re

Conference on (2009), IEEE, pp. 1546–1553.

Journal Pre-proof Highlights

of ro -p re lP



na



ur



A thermal stereo vision technology for quantifying the flight behavior of birds and bats at offshore wind energy sites is presented. The technology uses a novel algorithm that generates three-dimensional flight track data in real-time. The technology was evaluated for position estimation accuracy using an unmanned aerial system equipped with GPS. The results showed that the estimated position data was within ±10 meters of the GPS-derived position in the x and y dimensions (flight height), and within ±20 meters in the z (range) dimension for 90% of the data points.

Jo