Ecological Informatics 26 (2015) 27–35
Contents lists available at ScienceDirect
Ecological Informatics journal homepage: www.elsevier.com/locate/ecolinf
Combining a locomotion indicator and data mining to analyze the interactive patterns between copepods and ciliates Meng-Tsung Lee a, Jiang-Shiou Hwang b, Chih-Yung Hsu c, Yang-Chi Chang c,⁎ a b c
Department of Marine Leisure Management, National Kaohsiung Marine University, No. 142, Haijhuan Rd., Kaohsiung 811, Taiwan Institute of Marine Biology, National Taiwan Ocean University, No. 2, Pei-Ning Road, Keelung 202, Taiwan Department of Marine Environment and Engineering, National Sun Yat-sen University, No. 70, Lien-Hae Road, Kaohsiung 804, Taiwan
a r t i c l e
i n f o
Article history: Received 23 November 2013 Received in revised form 6 January 2015 Accepted 7 January 2015 Available online 14 January 2015 Keywords: Data mining Locomotion indicator Qualitative motion description Copepod and ciliate interaction
a b s t r a c t Interactions between zooplankton not only affect the world's carbon fixation but also have a direct impact on the yields of the fishing industry. Both copepods and ciliates have a crucial linkage role in constituting the marine food web. Analyzing the predator–prey interactions between these two species helps us to understand the productivity of oceans better. In this study, we explored the interactive patterns between copepods and ciliates and used the locomotion indicator net-to-gross displacement ratio (NGDR) to conduct quantitative analyses on the swimming patterns of copepods. We discovered that the movement trails of copepods are more distorted in undisturbed environments where the NGDR was significantly lower. In an environment where ciliates were present, the NGDR of copepods was significantly higher. This result indicated that the movement trails in the latter scenario were more linear and that the NGDR can clearly distinguish the swimming patterns of copepods. In addition, this study developed a qualitative motion description to be embedded in the interactive trail data of copepods and ciliates, which facilitates the data mining technologies needed to perform advanced analyses. The results of the association rules and decision tree analyses clearly demonstrated the interactive characteristics that could not be explored solely using locomotion indicators. The most obvious characteristic is that the swimming patterns employed by copepods to approach ciliates were either a downward vertical sinking or a horizontal movement. Upward movements for the copepods were not typically observed. More detailed swimming patterns of interactions between copepods and ciliates were revealed using the rule-based forms. © 2015 Elsevier B.V. All rights reserved.
1. Introduction Both copepods and ciliates play a critical role in the marine food web (Chen et al., 2012; Hwang and Martens, 2011; Verity and Smetacek, 1996). These species constitute the major zooplankton in the ocean (Hwang and Martens, 2011; Sanoamuang and Hwang, 2011) and affect the fish population, carbon cycle and energy flow in the marine ecosystem (Dahms et al., 2012; Kiørboe et al., 1998; Ohman and Hirche, 2001). The individual behaviors of copepods and ciliates have been the focus of numerous studies (Chang et al., 2011; Doall et al., 2002; Schmitt et al., 2006; Wu et al., 2010; Yen et al., 2008). Analyzing the interactions between these two organisms remains an important issue to be explored because the results of such studies would help in understanding the fundamental mechanisms of the marine food web (Hwang and Strickler, 2001). Many studies of zooplankton behavioral patterns in small-scale environments use video recording to trace movement trails (Buskey et al., 1987; Dahms and Hwang, 2010; Strickler and Hwang, 1999; ⁎ Corresponding author. Tel.: +886 7 5252000 5176. E-mail addresses:
[email protected] (M.-T. Lee),
[email protected] (J.-S. Hwang),
[email protected] (Y.-C. Chang).
http://dx.doi.org/10.1016/j.ecoinf.2015.01.001 1574-9541/© 2015 Elsevier B.V. All rights reserved.
Vandromme et al., 2010; Yen and Fields, 1992). Analyses after acquiring trail data mostly rely on locomotion indicators to objectively illustrate the characteristics of the zooplankton movement (Chen et al., 2012; Fields and Yen, 1997; Lee et al., 2010; Mazzocchi and Paffenhöfer, 1999). There are several locomotion indicators, such as net-to-gross displacement ratio (NGDR), diffusion coefficient (Visser and Thygesen, 2003), fractal dimension (Uttieri et al., 2005, 2007; Cianelli et al., 2009), and multi-fractal dimension (Schmitt and Seuront, 2001; Seuront et al., 2004a,b). Among these indicators, the NGDR is simple and effective and has been frequently used in related studies (Buskey et al., 1983; Chang et al., 2011; Chen et al., 2012; Jakobsen et al., 2005; Mazzocchi and Paffenhöfer, 1999; Tseng et al., 2013; van Duren and Videler, 1995; Weissburg et al., 1998; Wu et al., 2011). Eventually, statistical analyses of the indicators summarize the specific movement patterns for certain species of zooplankton. Traditional data-oriented analyses, such as statistical inference, are hypothetico-deductive processes (Galitsky et al., 2007). These processes always presume relevant null and alternative hypotheses based on the literature or experts' knowledge and then substantiate the assumptions through an appropriate test. For biological data analyses, statistical inference plays an important role in finding correlations between biological parameters, but it is incapable of interpreting causal links
28
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35
between related parameters (Galitsky et al., 2007). To address this problem, an alternative data-oriented analysis called data mining (DM) has been applied to analyses biological data. Unlike statistical inference, DM is an inductive inference process that typically draws general rules or patterns from a large data set. DM is an effective tool to assist decision makers in exploring useful information hidden in the data and finding solutions to a given problem (Han and Kamber, 2000). The applications of DM have covered various domains, including ecology, environment, and medicine (Chang et al., 2011; Creighton and Hanash, 2003; Dixon et al., 2007; Ekasingh and Ngamsomsuke, 2009; Kanevski et al., 2004; Recknagel, 2001; Xia et al., 2005; Zhang et al., 2005), and with the advancement in DM many research domains, such as bioinformatics, medical and health informatics, and ecological and environmental informatics, were emerged. DM characteristically reveals hidden rules or patterns in a large data set and works well with unstructured and highly non-linear data, making it suitable for studying foraging behavior (Chang et al., 2011). Although the NGDR indicator is able to describe the approximate changes of trails during the interactions of copepods and ciliates, the indicator is not sufficient to interpret more subtle interactive patterns spatially. Therefore, this study applied DM techniques for spatial analyses of interactive movements between copepods and ciliates. We adopted a qualitative motion description as proposed by Lattner et al. (2006) to represent the interactive trail data of two organisms, which facilitates the advanced analyses with DM technology. Lattner et al. (2006) applied sequential pattern mining to find frequent patterns in dynamic situations of 2D RoboCup simulation games. Researchers in various fields have used DM technology for spatial pattern analyses. Xia et al. (2005) studied the spatiotemporal movement patterns of tourists visiting Phillip Island. Using clustering and decision tree analyses, they were able to identify patterns between the tourists' profiles and their spatiotemporal movement. Chang et al. (2011) used the decision tree method and locomotion indicators, including NGDR, turning levels or degrees, and fractal dimensions to analyze swimming patterns of ciliates in various foraging environments. Overall, the objectives of this study can be summarized as follows:
Professor Hwang Jiang-Shiou, one of the coauthors. The code names of the five experiments are listed in Table 1, which “Co” stands for copepods, “Ci” for ciliates, and “A” for alga. The focus of the study was to analyze the interactive patterns of zooplankton, which made Co_Ci_A (1) and Co_Ci_A (2) the experimental group while Co, Ci_A, and Co_A the control group. The organisms used in the experiments were cultivated by the laboratory at University of Texas, Marine Science Institute (Texas, USA). The filming period lasted from August 1 to 2, 2004, and data were recorded in a dark room under the same environmental conditions at 22 ± 1 °C. The only source of light was an infrared LED lamp (peak wavelength 910 nm; 1.45 volts). 15 adult copepods of Acartia tonsa and 255 marine ciliates of Strobilidium Sp. were placed inside 4.5 × 1.2 × 2 (length, width, and height) mm laboratory vessels, containing 15 ml of sea water at a salinity of 30 ppt (parts per thousand) and a regular pH ranging from 7.4 to 8.5. The density of copepods was 1150 individual per liter which is common in the copepod aquaculture ponds or coral reef environments, and is also in the range of many habitats of coastal waters. The density of ciliates was 17,000 individuals per liter which is in the range of many marine habitats. The high density of copepods and ciliates allowed us to track and visualize the prey/predator interactions. The algae that were used consisted of Isochrysis galbanas, Gymnodinium Sp., and Rhodomonas Sp. at a 1:1:1 ratio. Filming began approximately 15 min after placement of the copepods and ciliates to allow them to adjust to the environment, and filming time last approximately 1 h. The conditions and settings of the five experiments are summarized in Table 1.
2.1. Data collection and preprocessing
2.1.2. Trail data collection It is necessary to convert the videos into movement trail data for the subsequent analyses. Therefore, the data pre-processing work included digitizing the trail images and extracting the coordinates associated with the trails to generate the digital trail data. This study used LabTrack (Bioras, Kvistgård, Denmark) to digitize the trails in video recordings. LabTrack can be used to track moving objects that are observed in continuous images and to perform frame-by-frame continuous image analyses. This software is well-suited for tracking zooplankton movement trails (Kiørboe, 2008; Titelman, 2001). The original videos were recorded at a 29.97 fps (frame per second), and LabTrack extracted spatial coordinates from the trail images at a 0.033 second interval. For the experimental group, we recorded the trails of copepods and ciliates when both of them appeared in the filming screen throughout the videos. It is not necessary to sample too many trails from the control group whose function is to provide a basis of comparison with the experiment group (Broglio et al., 2001). The study took 10 s of videos every 5 min for tracking the trajectories of copepods and ciliates. Table 2 shows the total numbers of trails of the five experiments identified by the image analyses. DM used only the trails from the experimental group, while NGDR analyses used all the trails from the five experiments.
2.1.1. Laboratory experiment In order to study the interactions between copepods (predator) and ciliates (prey), five experiments were conducted and video recording by
2.1.3. Qualitative motion description This study used a qualitative motion description to replace the continuous coordinate data of the movement trails for DM. A numerical
1. Using NGDR indicators to perform measurements of the data obtained from the swimming trails of copepods and ciliates, we described the qualitative abstraction of swimming behaviors in various situations. 2. We restructured the swimming trails of copepods and ciliates using a qualitative motion description such that the interactive patterns can be better represented. 3. By using data mining technology to perform analyses based on the coded trail data, we further delineated the interactive patterns between these two organisms. 2. Materials and methods
Table 1 Conditions and settings of the five experiments. Code name
Co
Description Date Light condition Volume/salinity/pH of water Room temperature Filming vessel Ciliates Algae Copepods
Only copepods exist Both alga and copepods exist Both ciliates and alga exist Copepods, ciliates, and alga all exist August 1, 2004 August 2, 2004 Dark room with an infrared LED lamp (peak wavelength 910 nm; 1.45 volts) 15 ml of 30 ppt sea water and a regular pH from 7.4 to 8.5 23 °C 22 °C 23 °C 23 °C 4.5 mm by 1.2 mm by 2 mm (length, width, and height) on the top with cover None Strobilidium Sp. (255 individuals) None Mixture of Isochrysis galbana, Gymnodinium Sp., and Rhodomonas Sp. (about 1:1:1) Acartia tonsa (15 individuals) None Acartia tonsa (15 individuals)
Co_A
Ci_A
Co_Ci_A (1)
Co_Ci_A (2) Repeated experiment
23 °C
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35 Table 2 Numbers of trails of the five experiments. Experiment
Co Ci_A Co_A Co_Ci_A(1) Co_Ci_A(2)
Numbers of trails Copepod
Ciliate
32 – 30 74 88
– 47 – 209 223
description of motion between moving objects is less efficient for spatial reasoning (Burger and Bhanu, 1992). Therefore using qualitative motion descriptions of interactive phenomena has attracted many interests in the field of artificial intelligence (Miene et al., 2004; Mossakowski, and Moratz, 2012). We followed the idea of Lattner et al. (2006) who applied qualitative representations for sequential pattern mining in the simulated robotic soccer games. Foraging behavior study is similar to sport activity analysis in that both want to find interactions between objects in space, which makes the qualitative motion description of trail data feasible and useful. Fig. 1 shows an example of the coding process by using qualitative motion description. The first step was to rearrange all the trail data in a sequential time segment, which has a 0.033 second temporal resolution. The second step was to aggregate the trail data in several time segments to form a group with a longer time interval, which was set as a time span when a copepod appeared in the filming screen. Thus, all temporal intervals were of variable durations, and a group of data
29
within a time interval should cover the information of both copepods and ciliates. It is too tedious and trivial to use 0.033 s for qualitative motion description. Therefore, each temporal interval was divided into several 0.5 second time windows to facilitate the observations of interactive behaviors of copepods and ciliates. For example, there is one copepod interacting with three ciliates during a temporal interval as shown in the middle of Fig. 1. The first time window would accommodate two data records, one is Co1 and the other is Ci1; and throughout the temporal interval there should be 9 data records as shown in left lower part of Fig. 1. The third step was to convert the numerical trail data into a qualitative motion description except NGDR and distance. Within a time window, a NGDR value indicates one copepod's swimming behavior and a distance is the Euclidean distance between a copepod and a ciliate. These two continuous attributes would be converted into the discrete attributes for association rule analysis which is discussed in the later section. But, the two attributes remain continuous for decision tree analysis. Spatial relations regarding the positions of a prey and a predator in a 2D space affect their interactions. However, it is difficult to use coordinate data in a smaller time period (0.033 s) to represent spatial relations in a time window with a larger temporal interval (0.5 s). To resolve this problem, Cao et al. (2005) proposed the intuitive concept of converting precise spatial locations into larger spatial regions. This study divided the filming screen into a 3by3 grid which can simplify the topological relationships in a continuous 2D space and represent the qualitative classes as suggested by the coding method of qualitative motion description. According to the experimental setup and the LabTrack
Fig. 1. Coding process of qualitative motion description for trail data.
30
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35
configuration, each video frame is in the X–Z plane with the origin in the upper left corner. Therefore, the nine-cell grid was arranged as shown in the center lower part of Fig. 1, where the upper row covered grids 1, 2, and 3 (from left to right), the middle row covered grids 4, 5, and 6, and the lower row covered grids 7, 8, and 9. One important coding process of the trail data is to describe qualitatively the interactions between copepods and ciliates. This study defined five interaction behavior patterns, including approach, depart, meet, miss, and constant. The five states of interactions were determined by the changes in distance between a predator and prey from one time window to the next. The approaching state referred to a noticeable decrease in distance, and the departing and constant states referred to an obvious increase or no change in distance, respectively. The meeting state was defined as the distance being shorter than 1.0 mm, which was inspired by Broglio et al. (2001), who suggested that 0.8 ± 0.12 mm would be an unsuccessful attack distance for copepods (Acartia clausi). The missing state was specified as when a meeting state occurred in the previous time window and the ciliate was still observed (survived) in the next time window. 2.2. NGDR indicators This study used the NGDR and distance (between two objects) to measure the interactive behaviors of copepods and ciliates. The NGDR is a ratio that can measure the degree of path distortion (Weissburg et al., 1998) and the tendency for organisms to stay in certain regions by changing their directions of movement. The NGDR calculation is shown in Eq. (1) and consists of the linear distance between the starting point and the end point divided by the total path taken. The values for the NGDR range from 0 to 1. The closer the value is to 0, the more tortuous and distorted the motion path is. The closer the value is to 1, the more linear the motion path is.
NGDR ¼
l L
ð1Þ
l is the linear displacement distance between the starting point and the end point, and L is the total distance traveled.
X. The derived association rules must satisfy the minimum thresholds of support and confidence. SupportðX → YÞ ¼
σ ðX∪Y Þ N
ConfidenceðX → YÞ ¼
ð2Þ
σ ðX∪Y Þ σ ðX Þ
ð3Þ
N is the total number of transactions and σ (X) is the support count defined as the total number of transactions for support itemset X. The association rule analyses require the use of categorical or class data. We divided the two continuous attributes, distance and NGDR, in the coded trail data into four classes by K-means clustering. The value ranges of the four classes in the two attributes are shown in Table 3. 2.3.2. Decision tree Decision tree analyses are based on classification algorithms, which can objectively distribute objects into one of several predefined classes (Friedl and Brodley, 1997). Such analyses divide a dataset in a recursive manner so that subsets of the data possess increasingly similar characteristics. The structure of a decision tree resembles that of an actual tree and is composed of a root node, intermediate nodes, and terminal nodes. The root node contains all available data, has no inputs, and can have from zero to many outputs. Intermediate nodes consist of divided datasets with one input and two or more outputs. Terminal nodes, also called leaf nodes, have one input but no output. This study selected the J48 algorithm to conduct decision tree analyses. It is based on an improved version of the decision tree classification method C4.5 proposed by Quinlan (1993). J48 is an open source Java implementation of the C4.5 algorithm in the WEKA (Waikato Environment for Knowledge Analysis) data mining tool. This algorithm expands in a manner similar to a tree structure, and the downward computing process, which begins at the root node and stops at the leaf nodes, should generate rules that are easy to understand. During the generation of a tree branch, intermediate nodes are acquired by splitting their precedent nodes based on a designated attribute. To determine which attribute would best classify a dataset, entropy is generally used as the performance indicator (Tan et al., 2005), as shown in Eq. (4). n X pi log2 ðpi Þ Entropy ¼ −
ð4Þ
i¼1
2.3. Data mining methods Although researchers can apply the NGDR to indicate the interactive characteristics of copepods and ciliates, the locomotion indicator cannot provide sufficient information regarding the patterns of predator–prey interaction. For example, the NGDR is unable to support descriptions about the long-range detection ability of predators. Furthermore, predatory behaviors may be strongly influenced by prey distribution (Leising and Franks, 2002; Seuront et al., 2004a, 2004b). By employing the state-of-the-art DM techniques and an appropriate method for qualitative motion description, the study can provide insight into the interactive behaviors of copepods and ciliates. 2.3.1. Association rules Association rules are extremely useful in locating targeted relationships that are hidden among large amounts of data. They are typically represented in the form X → Y, where X is the antecedent itemset and Y the consequent itemset. X and Y are disjointed itemsets, meaning that X ∩ Y = ϕ. Support and confidence are two frequently used indicators, as defined by Eqs. (2) and (3), to measure the degree of strength of association rules. The support for rule X → Y refers to the number of transactions (in percentages) that contain both X and Y. Confidence refers to the transaction probability of Y occurring within all transactions
where i refers to a given class, and pi refers to the proportion of class i in a dataset. Finally, n refers to the number of classes in an attribute. Entropy is frequently viewed as a measure of the degree of disorder or uncertainty. The higher the degree of disorder, the more random or disorderly the situation becomes. Decision tree analyses are designed to minimize the degree of disorder for data classification. J48 adopts the fundamental concept of information entropy, as applied in the traditional ID3 (Iterative Dichotomiser 3) algorithm, to generate decision trees. Furthermore, J48 considers the number of classes (splits) for the best partition of nodes by using the gain ratio indicator. The gain ratio is used to normalize information gain (entropy) through a split information value that represents the potential information generated by splitting the dataset into several partitions. The attribute with the maximum gain ratio is selected as the best splitting attribute. Once a tree has been created, J48 goes back through the tree and prunes the tree by substituting leaf nodes for branches that do not help in gaining more Table 3 Class ranges for distance and NGDR. Class attribute
A
B
C
D
Distance NGDR
N2.628 N0.68
1.978–2.628 0.45–0.68
1.2596–1.978 0.24–0.45
b1.2596 b0.24
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35
information. The tree pruning process can further help to avoid problems caused by an over-fitting tree. After data collection and preprocess, the study applied WEKA software to conduct data mining analyses on the trail data which had been coded using the qualitative motion description. WEKA is a free and popular machine learning software developed by the University of Waikato in New Zealand. The software, written in Java, supports various data mining analyses such as classification, clustering, association rules, and the visualization of data. We used the association rules and decision tree functions to explore the intrinsic features in the trail data. 3. Results and discussions 3.1. NGDR for interactive trail analyses The study used two sets of NGDR data to reveal the difference of the interactive behaviors of copepods and ciliates, one from the predators' perspective and the other from the preys' perspective. Four of the five experiments involved copepods' movements which were used to analyze the interactions from the predators' perspective. Three experiments involved ciliates' movements which were used to show the interactions from the preys' perspective. 3.1.1. Copepod perspective analyses Fig. 2 shows the percentage histograms of four NGDR analyses. The Co_A and Co in the control group have peak percentages in smaller NGDR values, as shown in Fig. 2a and b. NGDR values in both experiments were similar (Mann–Whitney U-test, p N 0.01), indicating that alga did not affect the swimming patterns of copepods. However, there existed significant differences (Mann–Whitney U-test, p b 0.01) between the experimental group Co_Ci_A(1) and Co_Ci_A(2) and the control group Co and Co_A, as indicated in Fig. 2. The peak percentages
31
of the experimental group located around the middle ranges of NGDR values, as opposite to the control group having peak percentages in smaller NGDR values. Some copepods in the experimental group showed higher NGDR values, suggesting their movements were more similar to straight lines or cannonball trajectories. 3.1.2. Ciliate perspective analyses As shown in Fig. 3a, the NGDR of ciliate movements when no copepod is presented had smaller values, and more than 80% of the trails had NGDR values within 0–0.1. In contrast, the experimental group in Fig. 3b and c shows different distributions compared to the control group (Mann–Whitney U-test, p b 0.01). When copepods were appeared, some of the ciliates changed their swimming patterns from highly distorted trails to straight trails. 3.2. Association rule analyses The interactive patterns between copepods and ciliates derived by the association rule analyses were in rule-based form. The association rules identified by WEKA when setting a minimum confidence at 70% are shown in Table 4. Since the discretization of the trail data, the attributes appeared in the rule antecedents and consequents are symbolized by their nominal values. For example, ‘dist = D’ represents the distance between a copepod and a ciliate is shorter than 1.2596 mm. Fourteen rules were acquired from the coded trail data, and these rules provide detailed information on the interactions of copepods and ciliates. To summarize the rule-based knowledge, we organized the rules with similar patterns using an item-list as following. 1. When the interactive behavior between the copepods and ciliates was either meet or miss, their distance was 1.2596 mm or less. (Rules 1, 2 and 4)
Fig. 2. Percentage histograms of NGDR analyses from the perspective of copepods. (a and b are the control group; c and d are the experimental group).
32
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35
Fig. 3. Percentage histograms of NGDR analyses from the perspective of ciliates. (a is the control group; b and c are the experimental group).
2. When copepods were found in the top region and ciliates in the bottom region, and their distances were less than 1.978 mm, more than 71% of the copepods swam in sinking patterns, indicating that most of the copepods moved in a vertical downward direction to use less energy for foraging. (Rules 3 and 14) 3. When copepods were found in the bottom region and ciliates in the top region, more than 71% of the copepods did not swim upwards, and the typical interactive behavior was either depart or constant. This situation was opposite of the previous one, indicating that most of the copepods would not spend much effort in moving up for foraging. (Rules 6, 8, 12, and 13) 4. When the distance between the copepods and ciliates was approximately 1.978 and 2.628 mm, and the trajectories of the copepod
appeared to be distorted (NGDR between 0.24 and 0.45), their interactive behaviors remained constant 73% of the time. (Rule 10) 5. When the interactive behavior of the copepods and ciliates was either approach or depart, more than 74% of the swimming patterns for the copepods were close to linear movements (NGDR value greater than 0.68). (Rules 5, 7, 9 and 11) 3.3. Decision tree analyses The decision tree analyses provided complementary information that might be overlooked by the association rules. There were over 250 rules in the original decision tree when disabled the pruning process with the default setting in WEKA. To avoid the problem of
Table 4 The results from association rules. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 a b c
Antecedent or condition action = meet action = miss co = xb, ci = xb+3, dista =D co = xb, ci = xb ci = 2, action = depart co = 9, ci = 1, action = constant co = 4, action = approach, dista =D co = 9, ci = 1, action = depart co = 5, action = approach, NGDR = A dista =B, NGDR = C co = 4, action = depart co = 9, ci = 1, dista =A co = 8, ci = 1 co = 2, ci = 8, action = approach
Consequent a
dist =D dista =D action = approach action = meet NGDR = A dista =A NGDR = A dista =A dista =D action = constant NGDR = A NGDR = B action = depart dista =C
Number of eventsc
Confidence
138–138 60–56 71–63 127–112 25–20 28–21 39–29 31–23 23–17 22–16 35–25 21–15 24–17 24–17
100% 93% 89% 88% 80% 75% 74% 74% 74% 73% 71% 71% 71% 71%
dist: distance class between a copepod (co) and a ciliate (ci). x: the numbering of the 9-cell grid. For rule #3, x is restricted from 1 to 6. Number of events: the first number is the support count of the antecedent for the rule, and the second number is the support count where both antecedent and consequence occur.
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35
over-fitting and create more concise and easily interpreted results, the pruning was enabled and 56 rules were acquired. We discussed the results with the domain expert Prof. Hwang and identified some interactive patterns between copepods and ciliates as summarized in the following list. 1. When the distance between the copepods and ciliates was less than 0.6172 mm, their interactive behavior was classified as meet. 2. When the copepods were found in the top region and ciliates in the bottom region, the copepods would swim in sinking patterns. This interactive behavior was also identified by the associate rules analyses. During these occasions, the decision tree analyses also indicated that the associated NGDR was higher (often N0.65), showing a more straight-line action for successful foraging. 3. When the copepods were found in the bottom region and ciliates in the top region, the copepods typically did not swim upwards, and their interactive behavior was classified as either depart or constant. This result is similar to that found with the association rules. 4. When the copepods and ciliates were found at the same depth, horizontal left-to-right or right-to-left swimming patterns for copepods frequently resulted in an approaching, interactive behavior. This phenomenon was not revealed by the association rules. 5. When the interactive behaviors between the copepods and ciliates were approach or depart, the NGDR values for the copepods were typically found to be in the mid-to-high range, showing that the copepods were swimming in a path close to a straight line. 6. The NGDR values for the copepods exhibited different distributions related to the corresponding positions of matching ciliates. When the copepods and ciliates were found at the same depth, the NGDR values of the copepods were large (often N0.55), indicating more linear swimming trails. When the two organisms were found at different depths and not aligned vertically, the NGDR values of the copepods were small. Furthermore, the greater the difference in depth between copepods and ciliates was, the lower the NGDR values for the copepods were. 7. Under meeting occasions, the copepods usually had higher NGDR values with nearly 70% of them having NGDRs larger than 0.45. Approximately 40% had even higher NGDR values (N0.68). Because meeting events suggest later foraging activities, it is therefore reasonable to argue that the copepods hunt down ciliates by swimming in cannonball-like trajectories.
33
3.4. Missing rate of copepods' foraging We defined the missing rate of copepods' foraging which is to divide the number of missing events by the summation of the meeting and missing events. It was therefore possible to infer the relationship between the likelihoods of copepods' foraging, which is called successful meeting, and the distances using the missing rate. The concentric circles, as shown in Fig. 4, are a schematic diagram of the relationship. The three distance intervals less than 0.6 mm had lower miss rates (at or less than 30%) compared to those of the distance intervals larger than 0.6 mm. When the distance exceeded 0.6 mm, the miss rate reached 57.14%. The miss rate rose to 71.43% as distances over 0.8 mm, showing much less probability of successful foraging. This finding is similar to the results of a study by Broglio et al. (2001), which measured the successful distance of the copepod predation on the ciliates as 0.52 ± 0.20 mm and the missing distance as 0.81 ± 0.12 mm. The decision tree analyses found a comparable outcome. One of the decision rules found that when the distance between a copepod and a ciliate was less than 0.6172 mm, a meeting event would happen with acceptable correctness. This distance was smaller than the 1.2596 mm found by the association rules because the categorical data used in the association rule analyses had a lower degree of precision. By analyzing those meeting events, which may be associated to consecutive missing events, we found the miss rate under this distance was also approximately 30%. Therefore, 0.6172 mm can be treated as the threshold of successful meeting distance for the copepods as estimated by the data mining analyses. 3.5. Discussions The most obvious interactive pattern identified by both the association rules and the decision tree analyses was the sinking movements of the copepods when they were found in the top region, and the ciliates were in the bottom region. More than 71% of such events occurred under this designated spatial situation, but the downward movements rarely appeared in the control treatments without ciliates. The copepods would approach ciliates by swimming in a vertical downward direction for effective forage. This special swimming behavior can reduce the hydrodynamic signals generated during movement and thus increase the rates of successful predation of copepods on ciliates (Titelman, 2001). In contrast, if the positions of the copepods and ciliates were exchanged,
Fig. 4. Schematic diagram of copepods' successful meeting in relation to distances.
34
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35
their interactive behaviors became either depart or constant, which was also recognized by both the association rules and the decision tree analyses. All three analyses in the study clearly indicated that the NGDR values for the copepods were higher, implying more linear swimming trails, when ciliates appeared in the surroundings. Such swimming pattern is reasonable for a predator (copepod) because it increases the chances of meeting a prey (ciliate) and is beneficial for copepods' foraging (Buskey et al., 1983; Visser and Kiørboe, 2006). The NGDR analyses in Fig. 2 also indicated that ciliates, not alga, would affect the swimming patterns of copepods (Jonsson and Tiselius, 1990). From the perspective of the ciliates, this study observed a substantial increase in NGDR values and moving speed for the ciliates when copepods were in the surrounding area. The NGDR analyses in Fig. 3 showed that the swimming patters of ciliates are heavily related to the appearances of copepods. Behavioral patterns are apparently different under the conditions of optimal foraging and avoiding predators (Titelman, 2001). Adaptive behaviors are crucial for any species to survive, especially in avoiding predation by reducing the chance of encountering predators. Some species achieve this by lowering their swimming velocities to minimize hydrodynamic signals, although such behavior sacrifices the ability to find resources (Jakobsen et al., 2005). An alternative method is to enhance species' long-range detection ability so they can execute successful escapes. 4. Conclusions This study combined the locomotion indicator NGDR and data mining technologies to explore the interactive behaviors between copepods and ciliates. The advantages of using such computational process are that the implicit patterns of zooplankton interactions can be discovered efficiently in a large trail data set. Furthermore, this study provided a better understanding of the swimming patterns for copepods and ciliates under various conditions than simply observing changes in swimming velocity (Waggett and Buskey, 2006). The integration of the NGDR and the data mining tools provide more information that could not be explored solely using locomotion indicators. For example, our association rules showed that, at a distance between 1.978 and 2.268 mm, the copepod trails were more distorted (where the NGDR was recorded between 0.24 and 0.45). In addition, the spatial positions of the copepods and ciliates also affect their interactive behaviors, which were easily identified in this study. The most obvious behavior is that the swimming patterns employed by copepods to approach ciliates took either downward vertical sinking or horizontal movements. Upward movements for the copepods typically did not take place. Additional inference using the missing rate revealed a relationship between distance and copepods' foraging of ciliates. This relationship was similar to what the decision tree analyses had determined, which was that the distance of 0.6712 mm was the threshold for successful meeting distance for the copepods. The application of the qualitative motion description also facilitated the study. The coding method inspired by the robotic soccer games was highly flexible since the design can be adjusted based on researchers' needs. The ability to accommodate motion descriptions in both spatial and temporal scales should make this coding method applicable for ecological research. Despite applying much-improved methods, this study still faced certain processing problems and deficiencies. Some recommendations for possible improvement in future studies are listed below. • Concerning the qualitative motion description of the trajectories of copepods and ciliates, even coded trail data can reduce the human errors caused by visual observations. The coding process did not cover the subtle actions of copepod's feeding apparatus (e.g., mouthparts, tentacles, and limbs). Therefore, the coded trail data focuses solely on describing interactive movement, and it may be too simple for an overall behavioral description. Besides, the qualitative motion description did not record copepods' successful capture event which
may restrict our findings. Future adjustment of the coding method in these regards is therefore needed for better analyses. • The videotaping in this study was shot in 2D X–Z coordinates on a flat screen to represent the copepods and ciliates' trails. This approach meant that 3D trails were projected in 2D, which affected the parameters related to distance calculation and other factors. Future experiments may learn from the studies of fish behavioral response to toxicants which used two cameras to record trajectories in 3D such that true spatial coordinates can be acquired to improve analyses. (Fukuda et al., 2010; Kang et al., 2009).
Acknowledgments This research was supported by grants NSC 98-2621-B-019-001MY3 and NSC 99-2611-M-019-009 from the National Science Council, and the ATU (Aim for the Top University) plan by the National Sun Yat-sen University. The authors also thank Professors J. Rudi Strickler from the University of Wisconsin, Milwaukee, USA, Edward J. Buskey from University of Texas at Austin, USA and Dr. Cheng-Han Wu for their assistance at various stages of the experiment and for constructive suggestions regarding the experiments. The author wishes to thank three anonymous reviewers for valuable comments on earlier versions of the manuscript. References Broglio, E., Johansson, M., Jonsson, P.R., 2001. Trophic interaction between copepods and ciliates: effects of prey swimming behavior on predation risk. Mar. Ecol. Prog. Ser. 220, 179–186. Burger, W., Bhanu, B., 1992. Qualitative Motion Understanding. Kluwer Academic Publishers. Buskey, E.J., Mills, L., Swift, E., 1983. The effects of dinoflagellate bioluminescence on the swimming behavior of a marine copepod. Limnol. Oceanogr. 28, 575–579. Buskey, E.J., Mann, C.G., Swift, E., 1987. Photophobic responses of calanoid copepods: possible adaptive value. J. Plankton Res. 9, 857–870. Cao, H., Mamoulis, N., Cheung, D.W., 2005. Mining frequent spatio-temporal sequential patterns. Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM'05), pp. 82–90. Chang, Y.C., Yan, J.C., Hwang, J.S., Wu, C.H., Lee, M.T., 2011. Data-oriented analyses of ciliate foraging behaviors. Hydrobiologia 666, 223–237. Chen, M.R., Moison, M., Molinero, J.C., Hwang, J.S., 2012. Assessing the effect of food and light on Calanus sinicus swimming behavior through video-recording experiments. J. Exp. Mar. Biol. Ecol. 422–423, 14–19. Cianelli, D., Uttieri, M., Strickler, J.R., Zambianchi, E., 2009. Zooplankton encounters in patchy particle distributions. Ecological Modelling 220, 596–604. Creighton, C., Hanash, S., 2003. Mining gene expression databases for association rules. Bioinformatics 19, 79–86. Dahms, H.U., Hwang, J.S., 2010. Perspectives of underwater optics in biological oceanography and plankton ecology studies. J. Mar. Sci. Technol. 18, 112–121. Dahms, H.U., Tseng, L.C., Hsiao, S.H., Chen, C.C., Kim, B.R., Hwang, J.S., 2012. Biodiversity of planktonic copepods in the Lanyang River (Northeastern Taiwan), a typical watershed of Oceania. Zool. Stud. 51, 160–174. Dixon, M., Gallop, J.R., Lambert, S.C., Healy, J.V., 2007. Experience with data mining for the anaerobic wastewater treatment process. Environ. Model. Softw. 22, 315–322. Doall, M.H., Strickler, J.R., Fields, D.M., Yen, J., 2002. Mapping the free-swimming attack volume of a planktonic copepod, Euchaeta rimana. Mar. Biol. 140, 871–879. Ekasingh, B., Ngamsomsuke, K., 2009. Searching for simplified farmers' crop choice models for integrated watershed management in Thailand: a data mining approach. Environ. Model. Softw. 24, 1373–1380. Fields, D.M., Yen, J., 1997. The escape behavior of marine copepods in response to a quantifiable fluid mechanical disturbance. J. Plankton Res. 19, 1289–1304. Friedl, M.A., Brodley, C.E., 1997. Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61, 399–409. Fukuda, S., Kang, I.J., Moroishi, J., Nakamura, A., 2010. The application of entropy for detecting behavioural responses in Japanese medaka (Oryzias latipes) exposed to different toxicants. Environ. Toxicol. 25, 446–455. Galitsky, B.A., Kuznetsov, S.O., Vinogradov, D.V., 2007. Applying hybrid reasoning to mine for associative features in biological data. J. Biomed. Inform. 40, 203–220. Han, J., Kamber, M., 2000. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Matel, CA. Hwang, J.S., Martens, K. (Eds.), 2011. Zooplankton behavior and ecology. Hydrobiologia 666, pp. 257–264. Hwang, J.S., Strickler, J.R., 2001. Can copepods differentiate prey from predator hydromechanically? Zool. Stud. 40 (1), 1–6. Jakobsen, H.H., Halvorsen, E., Hansen, W.B., Visser, A.W., 2005. Effects of prey motility and concentration on feeding in Acartia tonsa and Temora longicornis: the importance of feeding modes. J. Plankton Res. 27, 775–785.
M.-T. Lee et al. / Ecological Informatics 26 (2015) 27–35 Jonsson, P.R., Tiselius, P., 1990. Feeding behaviour, prey detection and capture efficiency of the copepod Acartia tonsa feeding on planktonic ciliates. Mar. Ecol. Prog. Ser. 60, 35–44. Kanevski, M., Parkin, R., Pozdnukhov, A., Timonin, V., Maignan, M., Demyanov, V., Canu, S., 2004. Environmental data mining and modeling based on machine learning algorithms and geostatistics. Environ. Model. Softw. 19, 845–855. Kang, I.J., Moroishi, J., Yamasuga, M., Kim, S.G., Oshima, Y., 2009. A study on swimming behavioral toxicity of Japanese medaka (Oryzias latipes) exposed to various chemicals for biological monitoring of water quality. In: Kim, Y.J., Platt, U., Gu, M.B., Iwahashi, H. (Eds.), Atmospheric and Biological Environmental Monitoring. Springer, pp. 285–293. Kiørboe, T., 2008. Optimal swimming strategies in mate-searching pelagic copepods. Oecologia 155, 179–192. Kiørboe, T., Tiselius, P., Mitchell-Innes, B., Hansen, J.L.S., Mari, A.W.V., Mari, X., 1998. Intensive aggregate formation with low vertical flux during an upwelling-induced diatom bloom. Limnol. Oceanogr. 43, 104–116. Lattner, A., Miene, A., Visser, U., Herzog, O., 2006. Sequential pattern mining for situation and behavior prediction in simulated robotic soccer. In: Bredenfeld, A., Jacoff, A., Noda, I., Takahashi, Y. (Eds.), RoboCup 2005: Robot Soccer World Cup IX. LNCS vol. 4020. Springer, pp. 118–129. Lee, C.H., Dahms, H.U., Cheng, S.H., Souissi, S., Schmitt, F.G., Kumar, R., Hwang, J.S., 2010. Predation of Pseudodiaptomus annandalei (Copepoda: Calanoida) by the grouper fish fry Epinephelus coioides under different hydrodynamic conditions. J. Exp. Mar. Biol. Ecol. 393, 17–22. Leising, A.W., Franks, P.J.S., 2002. Does Acartia clausi (Copepoda: Calanoida) use an arearestricted search foraging strategy to find food? Hydrobiologia 480, 193–207. Mazzocchi, M.G., Paffenhöfer, G.A., 1999. Swimming and feeding behaviour of the planktonic copepod Clausocalanus furcatus. J. Plankton Res. 21, 1501–1518. Miene, A., Visser, U., Herzog, O., 2004. Recognition and prediction of motion situations based on a qualitative motion description. In: Polani, D., Browning, B., Bonarini, A., Yoshida, K. (Eds.), RoboCup 2003: Robot Soccer World Cup VII. LNCS vol. 3020. Springer, pp. 77–88. Mossakowski, T., Moratz, R., 2012. Qualitative reasoning about relative direction of oriented points. Artif. Intell. 180–181, 34–45. Ohman, M.D., Hirche, H.J., 2001. Density-dependent mortality in an oceanic copepod population. Nature 412, 638–641. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Recknagel, F., 2001. Applications of machine learning to ecological modeling. Ecol. Model. 146, 1–3. Sanoamuang, L.O., Hwang, J.S. (Eds.), 2011. Copepoda: biology and ecology. Hydrobiologia 666, p. 1. Schmitt, F.G., Seuront, L., 2001. Multifractal random walk in copepod behavior. Phys. A 301, 375–396. Schmitt, F.G., Seuront, L., Hwang, J.S., Souissi, S., Tseng, L.C., 2006. Scaling of swimming sequences in copepod behavior: data analysis and simulation. Phys. A 364, 287–296. Seuront, L., Hwang, J.S., Tseng, L.C., Schmitt, F.G., Souissi, S., Wong, C.K., 2004a. Individual variability in the swimming behavior of the sub-tropical copepod Oncaea venusta (copepoda: Poecilostomatoida). Mar. Ecol. Prog. Ser. 283, 199–217. Seuront, L., Schmitt, F.G., Brewer, M.C., Strickler, J.R., Souissi, S., 2004b. From random walk to multifractal random walk in zooplankton swimming behavior. Zool. Stud. 43 (2), 498–510.
35
Strickler, J.R., Hwang, J.S., 1999. Matched spatial filters in long working distance microscopy of phase objects. In: Wu, J.L., Hwang, P.P., Wong, G., Kim, H., Cheng, P.C. (Eds.), Focus on Multidimensional Microscopy. Singapore: World Scientific Publishing 2, pp. 217–239. Tan, P.N., Steinbach, M., Kumar, V., 2005. Introduction to Data Mining. Pearson AddisonWesley, Boston, MA, USA. Titelman, J., 2001. Swimming and escape behavior of copepod nauplii: implications for predator–prey interactions among copepods. Mar. Ecol. Prog. Ser. 213, 203–213. Tseng, L.C., Dahms, H.U., Chen, Q.C., Hwang, J.S., 2013. Geospatial variability in the autumn community structure of epipelagic zooplankton in the upper layer of the northern South China Sea. Zool. Stud. 52. Uttieri, M., Zambianchi, E., Strickler, J.R., Mazzocchi, M., 2005. Fractal characterization of three-dimensional zooplankton swimming trajectories. Ecological Modelling 185, 51–63. Uttieri, M., Nihongi, A., Mazzocchi, M.G., Strickler, J.R., Zambianchi, E., 2007. Precopulatory swimming behaviour of Leptodiaptomus ashlandi (Copepoda: Calanoida): a fractal approach. J. Plankton Res. 29, 17–26. van Duren, L.A., Videler, J.J., 1995. Swimming behaviour of developmental stages of the calanoid copepod Temora longicornis at different food concentrations. Mar. Ecol. Prog. Ser. 126, 153–161. Vandromme, P., Schmitt, F.G., Souissi, S., Buskey, E.J., Strickler, J.R., Wu, C.H., Hwang, J.S., 2010. Symbolic analysis of plankton swimming trajectories: case study of Strobilidium sp. (Protista) helical walking under various food conditions. Zool. Stud. 49 (3), 289–303. Verity, P.G., Smetacek, V., 1996. Organism life cycles, predation, and the structure of marine pelagic ecosystems. Mar. Ecol. Prog. Ser. 130, 277–293. Visser, A.W., Kiørboe, T., 2006. Plankton motility patterns and encounter rates. Oecologia 148, 538–546. Visser, A.W., Thygesen, U.H., 2003. Random motility of plankton: diffusive and aggregative contribution. J. Plankton Res. 25, 1156–1168. Waggett, R.J., Buskey, E.J., 2006. Copepod sensitivity to flow fields: detection by copepod of predatory ctenophores. Mar. Ecol. Prog. Ser. 323, 205–211. Weissburg, M.J., Doall, M.H., Yen, J., 1998. Following the invisible trail: kinematic analysis of mate-tracking in the copepod Temora longicornis. Philos. Trans. R. Soc. Lond. B Biol. Sci. 353, 701–712. Wu, C.H., Dahms, H.U., Buskey, E.J., Strickler, J.R., Hwang, J.S., 2010. Behavioral interactions of the copepod Temora turbinata with potential ciliate prey. Zool. Stud. 49, 157–168. Wu, C.H., Dahms, H.U., Cheng, S.H., Hwang, J.S., 2011. Effects of food and light on naupliar swimming behavior of Apocyclops royi and Pseudodiaptomus annandalei (Crustacea, Copepoda). Hydrobiologia 666, 167–178. Xia, J., Ciesielski, V., Arrowsmith, C., 2005. Data mining of tourists' spatio-temporal movement patterns: a case study on Phillip Island. Proceedings of the 8th International Conference on GeoComputation, USA, pp. 1–15. Yen, J., Fields, D.M., 1992. Escape responses of Acartia hudsonica (Copepoda) nauplii from the flow field of Temora longicornis (Copepoda). Arch. Hydrobiol.–Beih. Ergebn. Limnol. 36, 123–134. Yen, J., Rasberry, K.D., Webster, D.R., 2008. Quantifying copepod kinematics in a laboratory turbulence apparatus. J. Mar. Syst. 69, 283–294. Zhang, B., Valentine, I., Kemp, P.D., 2005. A decision tree approach modelling functional group abundance in a pasture ecosystem. Agric. Ecosyst. Environ. 110, 279–288.