Journal of Veterinary Behavior 18 (2017) 7e12
Contents lists available at ScienceDirect
Journal of Veterinary Behavior journal homepage: www.journalvetbehavior.com
Research
Development of an ethogram to describe facial expressions in ridden horses (FEReq) Jessica Mullard a, Jeannine M. Berger b, Andrea D. Ellis c, Sue Dyson a, * a
Centre for Equine Studies, Animal Health Trust, Newmarket, Suffolk, United Kingdom San Francisco SPCA, San Francisco, California c Unequi Ltd, West Bridgford, Nottinghamshire, United Kingdom b
a r t i c l e i n f o
a b s t r a c t
Article history: Received 30 August 2016 Received in revised form 16 November 2016 Accepted 16 November 2016 Available online 25 November 2016
Many horses presumed to be sound by their riders are not. Facial expression ethograms have previously been used to describe pain-related behavior in horses, but there is a need for a ridden horse facial ethogram to facilitate identification of pain in ridden horses. The objectives of this study were to develop and test an ethogram to describe facial expressions in ridden horses and to determine whether individuals could interpret and correctly apply the ethogram, with consistency among assessors. An ethogram was developed by reference to previous publications and photographs of 150 lame and nonlame ridden horses. A training manual was created. Thirteen assessors (veterinarians of variable experience, n ¼ 4; equine technicians, n ¼ 3; equine studies graduates, n ¼ 2; amateur horse owners, n ¼ 2; equine veterinary nurse, n ¼ 1; a British Horse Society Instructor, n ¼ 1) underwent a training session and, with reference to the training manual, evaluated still lateral photographs of 27 training heads. Features were graded as Yes, No, or “Cannot see” (when it was not possible to determine the presence or absence of a feature). The ethogram was adapted, and after further training, the assessors blindly evaluated 30 test heads from nonlame and lame horses. Intraclass correlation (ICC) and free-margin kappa tests were used to assess consensus among assessors. For the training heads, single ICC matrix among observers resulted in an overall ICC of 0.50 (95% confidence intervals, 0.40-0.62). Four assessors consistently scored differently from the others, with ranges of ICC of 0.20-0.50 (mean, 0.41). There was no difference in assessors’ scoring related to their professional backgrounds. For the test heads, mean interrater agreement among assessors was 87%. Two assessors still scored consistently differently (0.280.50 ICC agreement; mean, 0.40) from the remaining 11 assessors (0.44-0.69 ICC agreement; mean, 0.56). The mean percentage of overall agreement was 80%, and the mean free-marginal kappa value was 0.72, standard deviation (SD) 0.22. The large SD was the result of inconsistency in assessments of the eyes and muzzle. It was concluded that the developed ethogram could reliably be utilized to describe facial expressions of ridden horses by people from different professional backgrounds. Future work needs to determine if nonlame and lame horses can be differentiated based on application of the ethogram. Ó 2016 Elsevier Inc. All rights reserved.
Keywords: lameness pain equine behavior
Introduction There are many horses that appear sound in hand but have underlying pain-related musculoskeletal problems which are evident to a trained observer in ridden horses (Dyson 2016a) but which may go unrecognized by owners, riders, trainers, or a less-
* Address for reprint requests and correspondence: Sue Dyson, Centre for Equine Studies, Animal Health Trust, Newmarket, Suffolk CB8 7UU, United Kingdom. E-mail address:
[email protected] (S. Dyson). http://dx.doi.org/10.1016/j.jveb.2016.11.005 1558-7878/Ó 2016 Elsevier Inc. All rights reserved.
experienced veterinarian. Horses that experience musculoskeletal pain on the lunge and when ridden show a variety of gait modifications in an attempt to reduce pain. These include reducing the range of motion of the thoracolumbosacral spine (Greve et al., 2016a; Dyson, 2016a,b), taking shorter steps, alteration of limb flight, and increasing body lean (Greve et al., 2016b). It has been observed that owners, riders, and trainers appear to have a poor ability to recognize signs of pain seen when horses are ridden (Dyson and Greve, 2016). In a study of 506 sports horses in normal work and presumed to be sound, 47% were overtly lame or had other pain-related gait abnormalities (e.g., stiff, stilted canter), thus
8
J. Mullard et al. / Journal of Veterinary Behavior 18 (2017) 7e12
highlighting the size of the problem (Greve and Dyson, 2014). As a result, problems are labelled as training related, rider related, behavioral, or deemed “normal” for that horse because “that is just how the horse has always gone.” Consequently, pain-related problems often get progressively worse, and if ultimately presented for investigation to an experienced lameness specialist, the problems are often too chronic and advanced to manage satisfactorily. Inability to perform satisfactorily may result in a decline in value of the horse and the standards of care (McLean and McGreevy, 2010). Improved pain recognition will enhance equine welfare. Many members of the veterinary profession have had little training in pain recognition and assessment of behavior and have had limited education in identification of low-grade lameness and recognition of musculoskeletal pain as a cause of poor performance. Behavioral changes related to experimentally induced orthopedic pain in horses have been described (Bussières et al., 2008; Lindegaard et al., 2010), but the features described, such as pawing the ground, are generally not applicable to ridden horses. The most recent advance in the recognition of subtle behavioral changes associated with pain is the investigation of facial expressions (Gleerup and Lindegaard, 2016). The spectrum of facial expressions exhibited by normal horses under nonridden circumstances has been described in detail (Wathan et al., 2015). An “equine pain face” was developed to describe facial features of horses with induced limb pain at rest (Gleerup et al., 2014). A Horse Grimace Scale, consisting of 6 features (the ears held stiffly backwards; orbital tightening [the eyelid is partially or completely closed]; tension above the eye; the mouth strained with a pronounced chin; the nostrils strained with flattening of their profile; and prominent strained chewing muscles) at 3 levels (not present, moderately, and obviously present) was developed to categorize the facial expressions of horses undergoing routine castration (Dalla Costa et al., 2014). However, it has been suggested that posture changes and overall body tension in resting horses may confound results and that further research is required to test for reliability of “pain grimace” measures (Hausberger et al., 2016). The Equine Utrecht University Scale for Facial Assessment of Pain (EQUUS-FAP) was developed to assess facial pain expressions in horses with or without colic and used a 3-point system from “normal ¼ 0” to “maximal visible pain” for 9 facial parameters, to derive an overall pain scale (van Loon and Van Dierendonck, 2015). This system showed good reliability among 4 veterinary students and 2 veterinarians when looking at 10-minute video recordings of stabled horses. However, for working horses, the rider or trainer should be able to assess animals and recognize facial expressions overall. An initial outline of a scheme to assess some aspects of facial expression in ridden horses has been described (Hall et al., 2014); however, a detailed description of the results was not documented. Head movement, ear position, teeth grinding, and lip movements were described in a study comparing 2 groups of young horses (3.5 years of age) when first lunged under a saddle and first ridden at a trot following either a “conventional training approach” or a “sympathetic training approach” (Visser et al., 2009). A comprehensive method for describing facial expressions in ridden horses, potentially to determine the presence of pain, has not been reported previously. The aim of the present study was to develop and test an ethogram to describe facial expressions in photographs from ridden horses and to determine whether individuals could interpret and correctly apply the ethogram, with consistency among assessors. The objective was to look at ridden facial expressions in general, rather than an immediate focus on pain which may introduce an observer bias. The results can then be applied in future studies to assess photographs and video recordings of ridden horses and to test for differences in expressions due to pain or stress.
Materials and methods The study was approved by the Clinical Ethical Review Committee of the Animal Health Trust (AHT 29 2015). Development of the ethogram A Royal College of Veterinary Surgeons Specialist in Equine Orthopaedics and British Horse Society Instructor (S.D.) reviewed close-up photographs (lateral and frontal images) of the heads of 150 ridden horses, which had been photographed for reasons unrelated to the present study, and with reference to previously published facial expression grading systems (Gleerup et al., 2014; Dalla Costa et al., 2014; Wathan et al., 2015). An ethogram for facial expressions was developed in conjunction with a Diplomate of the American College of Veterinary Behavior and Diplomate of the American College of Veterinary Welfare (J.M.B.). A comprehensive training manual was created by S.D. which documented, using text and photographs, all likely facial expressions and combinations thereof (see Supplementary Information - The Training Manual). Training of assessors There were 13 assessors: veterinarians with variable levels of equine experience (interns [recent graduates undergoing equine postgraduate training] n ¼ 2 [LZ, JR]; equine veterinarians qualified 1 and 4 years, respectively, both of whom had undergone equine internships [JM, LJ], n ¼ 2); equine studies graduates (CM, SG), n ¼ 2; a British Horse Society Instructor (AB), n ¼ 1; amateur horse owners (KB, KS), n ¼ 2; an equine veterinary nurse (JB), n ¼ 1; and equine technicians (AMcG, ML, HS), n¼3. All assessors underwent a training lecture of approximately 1hour duration and were provided with the training manual. Images of 27 numbered heads, showing a variety of facial expressions, were selected from Internet photographs by J.M.B. and comprised lateral images. Using a random number generator, 20 of the 27 heads were assigned to each assessor for evaluation in a specific order. Each assessor was asked to complete the ethogram for each head, using a binary yes/no scale to document the presence of a feature or if it was not possible to determine the presence or absence of a feature according to the definition and as illustrated in the training manual a “Cannot see” was recorded. The head photographs were available both in electronic format (and could be enlarged) and hard copy. Head positions were measured using a protractor. In a pilot study using images of 10 horses each assessed 5 times in random order, it was shown that measurements could be reliably acquired with a variance of <2%. The completed ethograms were evaluated by a trained analyst (J.M.), who is an equine veterinarian with 1 year postgraduate equine training and additional training in equine behavior. Following assessment of the training heads, the assessors were provided with feedback about their evaluation by J.M., and minor adjustments of wording of the ethogram (Table 1 and Supplementary Table 1) were made to provide added clarity, before analysis of the test heads. The test heads Thirty test heads were selected by S.D. to reflect a range of facial expressions observed in ridden horses (Figure 1). These comprised lateral photographs of the heads of 2 groups of horses: nonlame and lame horses ridden. All assessors evaluated all 30 heads, each in a different order determined using a random number generator. All observers were blinded to horse group,
J. Mullard et al. / Journal of Veterinary Behavior 18 (2017) 7e12 Table 1 An overview of the features of facial expression graded in the ethogram (see Supplementary Table 1) for assessment of ridden horses (grades: 0, cannot see; 1, not present; 2, present) Round versus almond-shaped eyes with tension of musculus levator anguli oculi medialis Eyes open or semiclosed Eyes normal expression or intense stare or cannot tell Sclera exposed, yes/no Ears forward; erect and parallel with pinnae facing forward; erect and to side, with pinnae facing outward (divergent); one forward and one erect and to side (divergent); one erect and to side (divergent) and one back; one ear forward and one back; or both ears back Mouth closed; tongue out but otherwise lips closed; lips separated but cannot see teeth; lips open showing teeth and no gum; lips open showing teeth and gum and teeth apposed; mouth open, that is, teeth slightly separated, but cannot see tongue; mouth open, that is, teeth widely separated, but cannot see tongue; mouth open and teeth slightly separated, exposing tongue; mouth open, teeth exposed and separated, and tongue outside oral cavity Jaw crossed, that is, upper and lower teeth not aligned, yes/no/cannot tell Salivation, yes/no Nostrils, tear drop or oval shape or rounded and angular; wrinkle between nostrils, yes/no; nostrils drawn to one side, yes/no/cannot tell Upper muzzle in line with lower muzzle; upper muzzle extended and angled. Lower muzzle, relaxed with curved contour; muzzle tense and angled Tongue in; tip of tongue protruding; large part of tongue out but cannot see teeth; large part of tongue out and teeth exposed Head erect, straight; head tipped to one side, nose to left or right; head turned Front of head vertical; front of head behind vertical up to 10 ; front of head behind vertical >10-30 ; front of head behind vertical >30 ; front of head in front of/above vertical up to 10 ; front of head in front of vertical >10-30 ; front of head >30 in front of/above vertical.
were unaware of the number of horses in each group, and had no other knowledge of the horses.
9
scoring. Results from this test were used to adapt the ethogram used for the training heads to that used for the test heads and are not reported here in detail. An intraclass correlation (ICC) matrix for single samples (incomplete scoring for training data [the assessors each evaluated 20 of 27 heads]; complete scoring for test data [all assessors evaluated the same 30 heads]) was carried out, which based agreement coefficients on “exact agreement” (McHugh, 2014) and therefore can be cautiously used for nominal scores (as long as “exact agreement is tested for”) to assess initial ICC and to highlight if some assessors scored obviously differently. Analysis was performed using SPSS, 19 (SPSS Inc., Chicago, IL). Generally, a correlation 61% is seen as substantial and 81% as almost perfect for ICCs (Landers, 2015). The coefficients derived from this analysis, however, are not valid values to assess actual correlation for individual behavior scores. For this purpose, a free-marginal kappa test for multiple observers using nominal scores (adapted by Randolph, 2005 from Cohen, 1960), using a calculator published by Randolph (2005), was applied to assess interobserver reliability. The free-marginal version relates to the fact that raters are not restricted in the number of cases that can be assigned to each category and a balanced study where each rater (assessor) assesses each case (horse), as in the test phase. Values of k range from 1.0 to 1.0, with 1.0 indicating perfect disagreement below chance, 0.0 indicating agreement equal to chance, and 1.0 indicating perfect agreement above chance. A k of 0.70 indicates adequate interrater agreement (Cohen, 1960; Randolph, 2005).
Results
Statistical analysis
Ethogram development: Training heads
Chi-square correlations among observers within single behavioral measurements were carried out for the training data, and scatter graphs and Spearman rank correlation between related behavioral measurements were applied to test for consistency in
The results and discussion with observers highlighted that certain behaviors were difficult to identify on the photographs. Forty-five percent of the behaviors were scored as “Cannot see” for >25% of the observations (Figure 2). However, for 65% of the
Figure 1. Lateral images of 3 test heads of horses 12 (A), 26 (B), and 30 (C). In (A), the right ear is erect with the pinna rotated outward. The left ear is forward. The right eye is open. The sclera cannot be seen. The mouth is slightly open but the tongue, teeth, and gums cannot be seen. Salivation is present. The lower muzzle is tense and angled, and the upper muzzle is extended and angled. The front of the horse’s head is >30 behind the vertical. In (B), both ears are back. The left eye is open, almond-shaped, with tension of musculus levator anguli oculi medialis. The sclera is visible. The left nostril is rounded and angular, with mediolateral widening and a wrinkle between the nostrils. The mouth is closed. Salivation is present. The upper muzzle is in line with the lower muzzle. The front of the horse’s head is >30 in front of the vertical. In (C), the left ear is forward and the right ear is backward. The right eye is open, almond-shaped, with tension of musculus levator anguli oculi medialis and an intense stare. The sclera is not visible. The mouth is open exposing the tongue and lower teeth but not the gum. There is no salivation. The lower muzzle is tense and angled and not in line with the upper muzzle. The front of the horse’s head is <10 in front of the vertical. The right nostril is rounded and angular, with mediolateral widening.
10
J. Mullard et al. / Journal of Veterinary Behavior 18 (2017) 7e12
Figure 2. The percentage of observations of 25 behaviors which were scored as “Cannot see” by 13 assessors scoring still photographs of 27 ridden horses. The 28 behaviors which were scored “Cannot see” for <25% of observations are not shown.
individual behaviors which were scored, there was an average median assessor agreement of 70% for all horses. All related behaviors were compared for consistence. For example, the head positions of relaxed and tight were scored fairly consistently: a large number of relaxed (no) were scored as tight (yes) and vice versa (Figure 3A), which would be expected from consistent scoring. Equally assessing whether 2 opposing markers (which cannot occur simultaneously) have nevertheless been scored as both occurring (yes) is important when testing the validity of an ethogram. Testing for consistency among observers does not evaluate this aspect. Again, for relaxed and tight head positions, there was no occurrence of 1 and 1 (yeseyes) combinations for these opposing behaviors, showing good consistency in assessing behavioral markers. More than 25% of these observations were scored as “Cannot see” (Figure 2), and the descriptors for these were adjusted to gain more clarity for the test heads. The majority of horses were scored the same for ears as “forward” and as “erect and parallel”
(Figure 3B); however, >25% of horses were scored with clear distinction, so this behavioral measure was retained. If all horses had been scored the same, then 1 of these markers could have been dropped in the adapted ethogram.
Assessor correlationdTraining heads Mean rater agreement among assessors for all observations was 69%. The single ICC matrix among observers resulted in an overall ICC of 0.50 (95% confidence intervals [CI], 0.40-0.62), with a significant but weak correlation (F ¼ 14; P <0.001). The pairwise ICC matrix (incomplete scoring) highlighted that 4 assessors scored consistently differently from the other 9 assessors, with ranges of ICC of 0.20-0.50 (mean, 0.41). Omission of these assessors led to an improvement in ICC to 0.63 for the other 9 assessors. There was no difference in assessors’ scoring related to their professional backgrounds.
Figure 3. Correlation between scores (0¼ cannot see; 1 ¼ present; 2 ¼ not present) for pairs of behaviors (A) relaxed versus tight head position and (B) ears forward versus ears erect and parallel. The length of the black lines represents the total number of scores per combination (overlaps have been pulled out into lines).
J. Mullard et al. / Journal of Veterinary Behavior 18 (2017) 7e12
Summary of assessor correlations according to behaviors Supplementary Table 2 summarizes the results for each behavior for both the training heads and the test heads. The majority of the behaviors which were repeatedly graded as “Cannot see” for the training heads were assigned a grade (yes/no) for the test heads. Assessor correlationdtest heads Mean rater agreement among assessors for test observations was 87%. The pairwise ICC matrix (complete scoring) highlighted that 2 assessors still scored consistently differently (0.28-0.50 agreement; mean, 0.40) from the remaining 11 assessors (0.44-0.69 agreement; mean, 0.56). Free-marginal interassessor correlations The interassessor correlations were relatively high: the mean percentage of overall agreement (Po) was 80% and the mean freemarginal k value was 0.72, standard deviation 0.22 (Supplementary Table 3), that is, adequate agreement (Randolph, 2005). However, the standard deviations were high because of 13 behavioral scores, related to the eye or muzzle, which had poorer correlation among assessors (Supplementary Table 3). Without these 13 values, the free-marginal k increased substantially to 0.82 0.12. Discussion An ethogram was developed to describe facial expressions in ridden horses which, after adaptation and training, was adequately interpreted by assessors from different professional backgrounds, with reasonable consistency for most observations. Observations relating to the eye and muzzle were the least consistent, with k values ranging from 0.25 to 0.42. Results for the test heads were superior to the training heads, following additional training and adaptation of the ethogram. The proportion of “Cannot see” remained high for twisted head (43%) and wrinkle between nostrils (70%) because only lateral photographs were included. Frontal pictures should enable correct assessment of these features. The importance of introducing the score “Cannot see” in developing an ethogram of facial expressions is highlighted by these results and adaptations (Supplementary Table 2) and also takes more account of human error and chance scoring. Some previous studies did not incorporate this “option” when assessing interrater reliability or ethogram validity (Gleerup et al., 2014; Wathan et al., 2015). Alterations in the shape of the muzzle were poorly interpreted in the present study. In contrast in a castration study, based on assessment of still images from video recordings, the ICC for 6 assessors for “mouth strained and pronounced chin,” a descriptor of the lower muzzle, was 0.72 (Dalla Costa et al., 2014). However the ICC for “strained nostrils and flattening of the profile,” a descriptor of the upper muzzle, was only 0.58, similar to the present study. The shape of the eye and alteration in tension in the periorbital muscles were also not reliably assessed in the present study. Orbital tightening (the eyelid is partially or completely closed) and tension above the eye (resulting in increased prominence of the “temporal crest bone”) had ICCs of 0.83 and 0.86, respectively, in the study of Dalla Costa et al., (2014). The reason for these differences is not known, although they may occur because horses in pain may have higher/clearer expression of these descriptors which renders more easily interpretable the scaling applied in that study used by 5 observers (i.e., not presentepresenteobvious; Dalla Costa et al., 2014), whereas the present study used a larger number of
11
descriptors of the eye and a larger range of observers (n ¼ 13). Wathan et al. (2015) linked muscular groups to assessing changes in this facial area, but this approach was not considered feasible for applied practical assessment by untrained observers. Two assessors performed differently from the remaining 11, despite training. This may reflect lack of attention to detail, less ability to learn, the speed with which they performed their assessments, and genuine inability to recognize the features described in the ethogram. Lack of consistency among observers engaged in grading lameness has been previously described (Fuller et al., 2006; Keegan et al., 2010). Our study used still photographs to develop the initial ridden facial ethogram. For further development and application, the whole body of ridden horses will be included in video recordings, but initially, this would have provided insufficient detail (sharpness) for assessment of the head in isolation. Still photographs provided an improved level of detail and were therefore used in this development study. This meant that it was not feasible to apply the Equine Facial Action Coding System (Wathan et al., 2015), which relies in part on the duration of an event (e.g., duration of eye closure). By relying on assessment of the heads in isolation, the assessors were not biased by observations of other aspects of the horses’ behavior which may have influenced interpretation (e.g., tail swishing). A specific ethogram for ridden horses was required because previously published pain scales (Bussières et al., 2008; Lindegaard et al., 2010; Gleerup et al., 2014; Dalla Costa et al., 2014; van Loon and Van Dierendonck, 2015) provided insufficient detail to document, in particular, alterations in ear position, mouth opening, position of the tongue, and position of the head relative to the neck. The desirability of a ridden horse ethogram has been previously highlighted (Hall et al., 2013) because many riders lack understanding of how horses may respond to pain. In a clinical situation, it is likely that the horse’s facial expression may differ at rest compared with ridden exercise, allowing comparison between the 2, and may vary during ridden exercise depending on the athletic demands being placed on the horse (Christensen et al., 2014; } et al., Górecka-Bruzda et al., 2015), the skill of the rider (Eiserio 2013), and the environment in which it is being worked. Ear position may be influenced by a variety of factors, including the length of the reins (Ludewig et al., 2013), noises, discomfort, negative interactions, and possibly ill health (Hausberger et al., 2016). Van Loon and Van Dierendonck (2015) addressed this partially by introducing “sound” and assessing horses’ reaction to this in their EQUUS-FAP scale. There are further potential differences between ridden horses and nonridden horses that are influenced by the bit (Manfredi et al., 2009; Cook and Mills, 2009), a restrictive noseband (McGreevy et al., 2012; Casey et al., 2013), contact via the reins to the rider } et al., 2013), alteration of head posture, possibly by force (Eiserio (McLean and McGreevy, 2010), the influence of physical exertion relative to the fitness of the horse, and the skill and even weight distribution of the rider. These factors may affect scoring of the mouth, tongue, head posture, and position of the nostrils, in particular, and the inclusion of descriptors such as “behind the bit,” “rein tension,” and the tongueemouth positions. It is anticipated that the FEReq ethogram (Supplementary Table 1) could be applied “live” to ridden horses, and it is currently used daily as part of routine clinical assessment by one author (S.D.), observing each horse from several angles. This study had limitations. It was restricted to the analysis of still photographs, capturing facial expression at one moment in time. We demonstrated that there was reasonable reliability of interpretation of the ethogram based on still photographs, but this does not necessarily translate to assessment of live horses. However, the study was initiated because one author (S.D.) had repeatedly observed these facial expressions in real time, and we believe that
12
J. Mullard et al. / Journal of Veterinary Behavior 18 (2017) 7e12
many of the observations can be made reliably and repeatedly. This phase of the study did not attempt to determine whether the facial expressions could be used to determine the likely presence or absence of pain, which is the subject of a separate study. The study did not assess other aspects of equine behavior observed during ridden exercise, which are also the subject of a different study. Conclusions An ethogram for facial expressions in ridden horses has been developed, which can reliably be utilized by people from different professional backgrounds. This novel work is the first step toward assessing pain in ridden horses other than through obvious gait changes or physiological posture. Future work needs to determine if pain-free and lame horses can be differentiated based on application of the ethogram (in still and moving pictures) and if key markers can be identified, so that the ethogram can be simplified to enable its use by all stakeholders in the equine industry to make progress in the key issues of welfare and performance of ridden horses. Acknowledgments The study was generously supported by World Horse Welfare and the Saddle Research Trust. The sponsors had no role in any aspect of the study. The authors thank the assessors: Karena Bean, Anne Bondi, Julie Breingan, Siobhan Gilligan, Laura Jones, Melissa Lockwood, Claire Martin, Abbi McGlennon, Jenny Routh, Heather Stephenson, Karen Sweet, and Lisa Zimmerman. Author contributions The idea for the study was conceived by Sue Dyson. The experiments were designed by Sue Dyson and Jeannine Berger. The experiments were performed by Sue Dyson, Jeannine Berger, and Jessica Mullard. The data were analyzed by Andrea Ellis. The paper was written by all authors. Ethical considerations The study was approved by the Clinical Ethical Review Committee of the Animal Health Trust (AHT 29 2015). Conflict of interest The authors declare no conflict of interest. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.jveb.2016.11.005. References Bussières, G., Jacques, C., Lainay, O., Beauchamp, G., Leblond, A., Cadoré, J.L., Desmaizières, L.M., Cuvelliez, S., Troncy, E., 2008. Development of a composite orthopaedic pain scale in horses. Res. Vet. Sci. 85, 294e306. Casey, V., McGreevy, P., O’Muiris, E., Doherty, O., 2013. A preliminary report on estimating the pressures exerted by a crank noseband in the horse. J. Vet. Behav.: Clin. Appl. Res. 8, 479e484. Christensen, J., Beekmans, M., van Dalum, M., Van Dierendonck, M., 2014. Effects of hyperflexion on acute stress responses in ridden dressage horses. Physiol. Behav. 128, 39e45.
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37e46. Cook, W., Mills, D., 2009. Preliminary study of jointed snaffle vs crossunder bitless bridles: Quantified comparison of behavior in four horses. Equine Vet. J. 41, 827e830. Dalla Costa, E., Minero, M., Lebelt, D., Stucke, D., Canali, E., Leach, M.C., 2014. Development of the horse grimace scale (HGS) as a pain assessment tool in horses undergoing routine castration. PLoS One 9, e92281. Dyson, S., 2016a. Evaluation of poor performance in competition horses: a musculoskeletal perspective. Part 1 Clinical assessment. Equine Vet. Educ. 28, 284e293. Dyson, S., 2016b. Equine performance and equitation science: clinical issues. J. Appl. Anim. Behav. Sci.. In press. Dyson, S., Greve, L., 2016. Subjective gait assessment of 57 sports horses in normal work: a comparison of the response to flexion tests, movement in hand, on the lunge and ridden. J. Equine Vet. Sci. 38, 1e7. }, M., Roepstorff, L., Wesihaupt, M., Egenvall, A., 2013. Movements of the Eiserio horse’s mouth in relation to horse-rider kinematic variables. Vet. J. 198, e33e e38. Fuller, C., Bladon, B., Driver, A., Barr, A., 2006. The intra- and interassessor reliability of measurement of functional outcome by lameness scoring in horses. Vet. J. 171, 281e286. Gleerup, K., Forkman, B., Lindegaard, C., Andersen, P., 2014. An equine pain face. Vet. Anaesth. Analg. 42, 103e114. Gleerup, K.B., Lindegaard, C., 2016. Recognition and quantification of pain in horses: A tutorial review. Equine Vet. Educ. 28, 47e57. Górecka-Bruzda, A., Kosinska, I., Jaworski, Z., Jezierski, T., Murphy, J., 2015. Conflict behavior in elite show jumping and dressage horses. J. Vet. Behav.: Clin. Appl. Res. 10, 137e146. Greve, L., Dyson, S., 2014. The interrelationship of lameness, saddle slip and back shape in the general sports horse population. Equine Vet. J. 46, 687e694. Greve, L., Dyson, S., Pfau, T., 2016a. Alterations in thoracolumbosacral movement when pain causing lameness has been improved by diagnostic analgesia. Vet. J. 48, 7e39. Greve, L., Pfau, T., Dyson, S., 2016b. Alterations in body lean angle in lame horses before and after diagnostic analgesia in straight lines in hand and on the lunge. Equine Vet. J.. In press. Hall, C., Huws, N., White, C., Taylor, E., Owen, H., McGreevy, P., 2013. Assessment of ridden horse behavior. J. Vet. Behav. 8, 62e73. Hall, C., Kay, R., Yarnell, K., 2014. Assessing ridden horse behavior: Professional judgment and physiological measures. J. Vet. Behav.: Clin. Appl. Res. 9, 22e29. Hausberger, M., Fureix, C., Lesimple, C., 2016. Detecting horses’ sickness: In search of visible signs. J. Appl. Anim. Behav. Sci. 175, 41e49. Keegan, K., Dent, E., Wilson, D., et al., 2010. Repeatability of subjective evaluation of lameness in horses. Equine Vet. J. 42, 92e97. Landers, R.N., 2015. Computing intraclass correlations (ICC) as estimates of interrater reliability in SPSS. Winnower 2. e143518.81744. Lindegaard, C., Thomsen, M., Larsen, S., Andersen, P., 2010. Analgesic efficacy of intra-articular morphine in experimentally induced radiocarpal synovitis in horses. Vet. Anaesth. Analg. 37, 171e185. Ludewig, A., Gauly, M., König von Borstel, U., 2013. Effect of shortened reins on rein tension, stress and discomfort behavior in dressage horses. J. Vet. Behav.: Clin. Appl. Res. 8, e15ee16. Manfredi, J., Rosenstein, D., Lanovaz, J., Nauwelaerts, S., Clayton, H., 2009. Fluoroscopic study of oral behaviors in response to the presence of a bit and the effects of rein tension. Comp. Exerc. Physiol. 6, 143e148. McGreevy, P., Warren-Smith, A., Guisard, Y., 2012. The effect of double bridles and jaw clamping crank nosebands on facial cutaneous and ocular temperature in horses. J. Vet. Behav.: Clin. Appl. Res. 7, 142e148. McHugh, M., 2014. Interrater reliability: the kappa statistic. Biochem. Med. (Zagreb) 22, 276e282. McLean, A., McGreevy, P., 2010. Ethical equitation: Capping the price horses pay for human glory. J. Vet. Behav.: Clin. Appl. Res. 5, 203e209. McLean, A., McGreevy, P., 2010. Horse-training techniques that may defy the principles of learning theory and compromise welfare. J. Vet. Behav.: Clin. Appl. Res. 5, 187e195. Randolph, J. J., 2005 Free-marginal multirater kappa: An alternative to Fleiss’ fixedmarginal multirater kappa. Paper presented at the Joensuu University Learning and Instruction Symposium 2005, Joensuu, Finland, October 14e15th, 2005. (ERIC Document Reproduction Service No. ED490661). van Loon, J.P.A.M., Van Dierendonck, M.C., 2015. Monitoring acute equine visceral pain with the Equine Utrecht University Scale for Composite Pain Assessment (EQUUS-COMPASS) and the Equine Utrecht University Scale for Facial Assessment of Pain (EQUUS-FAP): A scale-construction study. Vet. J. 206, 356e364. Visser, E., Van Dierendonck, M., Ellis, A., Rijksen, C., Van Reenen, C., 2009. A comparison of sympathetic and conventional training methods on responses to initial horse training. Vet J. 181, 48e52. Wathan, J., Burrows, A., Waller, B., McComb, K., 2015. EquiFACS: the equine facial action coding system. PLoS One 10, e0131738.