Journal of Science and Medicine in Sport (2008) 11, 214—217
TECHNICAL NOTE
Criterion-related validity and test—retest reliability of the 20 m Square Shuttle Test Giorgos S. Metsios a,b,∗, Andreas D. Flouris a,c, Yiannis Koutedakis b,d, Allan Nevill b a
Institute of Human Performance and Rehabilitation, Greece School of Sport, Performing Arts and Leisure, Wolverhampton University, UK c School of Health and Human Performance, Dalhousie University, Canada d Department of Physical Education and Sports Sciences, University of Thessaly, Greece b
Received 7 September 2006 ; received in revised form 30 November 2006; accepted 1 December 2006 KEYWORDS Field tests; Prediction; Repeatability; Agreement; Gold standard
Summary We assessed validity and reliability of the new 20 m square shuttle ˙ O2 max ) and compared it run test (SST) for predicting maximal oxygen uptake (V with its predecessor, the 20 m Multistage Shuttle Run Test (MST). In a repeatedmeasures randomised-block design, 74 healthy adult males performed the SST, the MST and a treadmill test (TT). To assess reliability, 40 of the total 74 volunteers were randomly-selected to perform the SST and MST twice. Unlike the SST (p > 0.05), ˙ O2 max (pred V ˙ O2 max ) from the MST was significantly increased from mean predicted V ˙ O2 max from SST and MST correthat measured during the TT (p < 0.05). The pred V ˙ O2 max at r = 0.95 (p < 0.001) and r = 0.63 (p < 0.001), respectively. lated with TT V Prediction error of SST was −0.3 ± 3.3 ml kg−1 min−1 with a coefficient of variation of ±3.5%, while the equivalent values for MST were 4.2 ± 7.3 ml kg−1 min−1 and ±7.4%. ˙ O2 max did not differ for both SST and MST (p > 0.05), while Mean test—retest pred V the corresponding test—retest correlation coefficients were r = 0.85 (p < 0.001) and r = 0.72 (p < 0.001). Reliability errors in 95% limits of agreement were 0.3 ± 4.8 and 0.6 ± 6.8 ml kg−1 min−1 while coefficients of variation were ±5.2% and ±6.8% for the SST and MST, respectively. It is concluded that SST is a more valid proxy than MST ˙ O2 max based on the current procedures, while both tests for predicting laboratory V are sufficiently reliable in healthy male adults. © 2007 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
Introduction Cardiorespiratory fitness field tests have been extensively adopted as proxy assessments of maxi∗
Corresponding author. E-mail address:
[email protected] (G.S. Metsios).
˙ O2 max ) in performance-based, mal oxygen uptake (V health-related and epidemiological research.1 The 20-m Multistage Shuttle Run Test (MST) is a practical and easily-administrated field test which has been frequently used as a proxy to predict laboratory˙ O2 max (mstd V ˙ O2 max ); yet, its validity has measured V been questioned based on biomechanical (i.e., dif-
1440-2440/$ — see front matter © 2007 Sports Medicine Australia. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.jsams.2006.12.120
Validity of the Square Shuttle Test ferent technique and musculature employed from gold standard),2 metabolic (i.e., contribution of anaerobic pathways),1 and methodological considerations (i.e., adequate predictive capacity only in heterogeneous populations with wide range of fitness levels).1,3 Flouris et al recently developed the 20-m Square Shuttle Run Test (SST), a ‘square’ variation of the classical MST, designed to reduce the test’s turning angle from 180◦ to 90◦ . Despite the positive findings of the original study, the SST was cross-validated against a small sample size of ten individuals. This suggested that further research was required in order to ensure that SST can accu˙ O2 max . Hence, the purpose of the rately predict V present study was to examine the validity and the reliability of the SST using a larger sample size and to compare it with its predecessor, the MST.
Methods Participants Seventy-four non-smoking healthy males (age: 21.6 ± 2 years; mass: 77.4 ± 10 kg; height: 178.4 ± 7 cm) volunteered. Participants were recreational athletes, not specifically trained for any sport, and had no medical conditions. Informed consents were signed by all participants after receiving explicit details regarding the procedures involved and extensive familiarisation with all protocols. The experimental protocol was approved by the Research Ethics Board of the University of Thessaly. Over a two-week period, participants performed in a random order the SST, the MST, and a treadmill test (TT). Assessments were separated by three to four days of rest. To assess reliability, 40 randomly-selected individuals performed both field-tests twice, separated by a four-day rest period. Participants were asked to refrain from exercise during the period of their participation in the study. Moreover, they were instructed to avoid caffeine intake for at least 12 h prior to all testing sessions. All tests were carried out by the same investigators between 09:00 and 12:00 h.
Data collection Field-testing Both tests (i.e., SST and MST) were conducted according to previously described procedures.2,4 All assessments were conducted in a rubberfloored gymnasium individually to eliminate competition bias using the same pre-recorded
215 audio-CD.2,4 Given the extensive MST existing documentation,3—6 this test will not be described in detail. In the SST, participants had the choice of running throughout either clockwise or anticlockwise on the 20-m long sides of a square marked on the gymnasium floor. Cones were placed inside and outside the four corners, to ensure running on course. Individuals were advised to perform wide turns to avoid disturbances in their running technique and were encouraged, particularly at the latter stages of the tests, to reach volitional exhaustion. Each participant started the test at one of the corners and followed the prescribed pace for as long as he was able to have completed one side of the square and be at the next corner in synchrony (i.e., ±1 s) with the next sound signal emitted from the audio-CD. Testing was terminated when participants could no longer maintain the prescribed pace emitted from the audio-CD for two ˙ O2 max (pred V ˙ O2 max ) consecutive signals. Predicted V for both SST and MST was calculated using the appropriate equations:2,4 ˙ O2max = MAS × 3.679 − 7.185 • SST: pred V ˙ O2max = MAS × 6.592 − 32.678 • MST: pred V where MAS is the maximal attained speed (km h−1 ). To ensure maximal performance, results were filed for further analyses only when the maximum heart rate, recorded at completion of both field tests (Polar Electro, Kempele, Finland), exceeded 185 beats min−1 . The heart rate of all participants was above this criterion at completion of both tests. ˙ O2 max assessment Laboratory V A modified Bruce treadmill protocol was utilised.2 The test involved a 5-min warming up period of steady-state horizontal running at 9 km h−1 on a motorised treadmill, followed by speed increments of 1 km h−1 every 2 min until exhaustion. An automated gas analyser (Vmax 29, Sensormedics, USA) was used to record respiratory parameters every 20 s while individuals inspired room air. The test was terminated when the participants were unable to maintain the prescribed treadmill pace. The high˙ O2 (ml kg−1 min−1 ) for any 20-s interval was est V ˙ O2 max . recorded as the individual’s mstd V Statistical analyses Prior to analyses, all variables were examined graphically and statistically for normality. Repeated-measures analysis of variance (ANOVA) with Bonferroni adjustment was used to compare mean performance values while Pearson’s correlation coefficients were used to assess linear ˙ O2 max scores and the relationships among the V
216
G.S. Metsios et al.
studied performance-related measures. The 95% limits of agreement7,8 were adopted to examine the validity and test—retest repeatability. Assessment of validity was further secured by calculating the squared cross-validity coefficients.9 The level of significance for all statistical analyses was set at p < 0.05.
˙ O2 max for both SST and MST was found since pred V ˙ O2 max to be homoscedastic.8 The calculated pred V 95% limits of agreement and percent coefficient of variation for SST were 0.31 ± 4.84 ml kg−1 min−1 and 5.2%, while the equivalent for MST was 0.66 ± 6.82 ml kg−1 min−1 and 6.8%. Regarding ˙ pred VO2 max , the test—retest correlation coefficients for SST and MST were r = 0.85 (p < 0.001) and r = 0.72 (p < 0.001), respectively.
Results Validity
Discussion
Table 1 depicts ANOVA, 95% limits of agreement and percent coefficient of variation for selected performance indices obtained from SST, MST and TT. The 95% limits of agreement are reported as abso˙ O2 max for both SST lute measurements since pred V and MST was found to be homoscedastic.8 The only statistically significant difference existed between ˙ O2 max and TT mstd V ˙ O2 max (p < 0.001). MST pred V ˙ O2 max and Correlation coefficients between mstd V ˙ pred VO2 max (Table 1) were used to calculate the squared cross-validity coefficient of 0.90 (i.e., 0.952 ) and 0.40 (i.e., 0.632 ) for SST and MST, respectively. Using the previously reported R2 coefficients (i.e., SST = 0.77; MST = 0.90)2,4 the shrinkage (i.e., R2 of original regression minus current squared cross-validity coefficient) for SST was −0.13% and for MST it was 0.50%.
Reliability No significant differences (p > 0.05) where detected ˙ O2 max test—retest [i.e., between mean pred V (mean ± 95%CI) 47.3 ± 2.28 versus 47.6 ± 2.29 for SST and 50.6 ± 2.19 versus 51.2 ± 2.20 for MST], maximal attained speed, test duration and distance covered, while the test—retest intraclass correlation coefficients ranged from 0.82 to 0.85 (p < 0.001) and from 0.72 to 0.75 (p < 0.001) for the SST and MST, respectively. The 95% limits of agreement are reported as absolute measurements
The present results indicated that the SST is a valid ˙ O2 max and reproducible proxy for the present mstd V procedures confirming the findings from the original SST study.2 The reduced validity of MST found here compared to some published data4,6 may be partly attributed to the homogeneous sample used here in terms of age, gender, training background and fitness level. Research utilising large heterogeneous samples, similar to that used in the MST validation studies, is subject to spurious linear relationships because of sample-heterogeneity and, according to previous investigations, should be treated with caution.1 It should be also noted that the disparate treadmill protocol used in this study compared with that of L´ eger and Gadoury—–start at 4.83 km h−1 and 0% inclination with increments in speed and inclination thereafter — may have also contributed to the reduced MST validity. These observations would ˙ O2 max validity is increased when suggest that pred V the duration and distance run of the proxy are similar to that of the laboratory test, providing support to the Energy Equilibrium concept1 which holds that proxies should be designed to simulate closely each laboratory protocol used as reference standard. The squared cross-validity coefficients demonstrated shrinkage greater than 0.10 for both SST and MST. The change in SST was −13%, suggesting that the prediction equation can be generalised with confidence. This was not surprising, given the increase in sample size in the present study
Table 1 Mean ± 95% confidence intervals for all tests and correlation coefficients, 95% limits of agreement, and percent coefficient of variation for the two field tests in relation to the treadmill test (n = 74) Test
˙ O2 max V (ml kg−1 min−1 )
r
TT SST MST
48.1 ± 2.39 47.8 ± 2.19 52.3 ± 2.28a
— — 0.95a −0.28 ± 3.25 0.63a 4.23 ± 7.25
95%LoA (ml kg−1 min−1 )
CV (%)
Speed (km h−1 )
Distance (m)
Time (min:s)
— 3.46 7.37
16.3 ± 0.82 15.0 ± 0.72a 12.9 ± 0.44a
2951.1 ± 417.44 2601.9 ± 356.01a 1626.4 ± 176.37a
13:34 ± 1:33 13:19 ± 1:21a 9:12 ± 0:48a
˙ O2max ,maximal oxygen uptake; r, correlation coefficient with TT; 95%LoA, 95% limits of agreement against TT; CV: percent Key: V coefficient of variation against TT; TT = treadmill test; SST = 20-m Square Shuttle Test; MST = 20-m Multistage Shuttle Run Test. a ANOVA and correlation coefficients between the field tests and TT statistically significant at p < 0.001.
Validity of the Square Shuttle Test compared to the original, which is normally accompanied by an increase in the squared cross-validity coefficient.9 The greater shrinkage of 50% for MST remains rather unexplained and generates concerns regarding the generalisability of the test’s prediction equation, given that the sample sizes of the current and the original4 study are rather similar. Yet it should be noted that the factors mentioned above regarding sample characteristics and laboratory testing protocols may have contributed to these results. Regarding reliability, our data revealed that both SST and MST are sufficiently repeatable, exhibiting reasonably narrow limits of agreement and consistent results in separating different levels of performance. This is in line with previous findings10 reporting test—retest correlation coefficients of r = 0.992 and r = 0.986 for SST and MST, respec˙ O2 max tively, and no mean difference between V values from the two field tests and laboratory testing. The present findings would be further strengthened by collecting construct-validity evidence and by adopting a multiple-trials design in order to eliminate learning effects in performance of the field tests. Also, the reader should be aware that the presented SST distance values represent approximations based on a square with 20-m long sides, and do not incorporate the additional distance the participants covered by performing wide turns. It is worth noting that a major benefit of the MST is that it allows the assessment of a large sample of individuals simultaneously, whereas in the SST only four are allowed (one in each corner). Hence, the improved SST pre˙ O2 max may be counteracted by a loss diction of V in practical application. It is concluded that the SST is a valid and reproducible assessment tool ˙ O2 max procedures, and, based on the present mstd V a more efficacious test for predicting laboratory ˙ mstd VO2 max in healthy adult males compared to MST.
217
Practical implications • Given its increased predictive strength, the 20-m Square Shuttle Run Test can be administered with reasonable accuracy for the evaluation of aerobic fitness in male recreational athletes. • Both 20-m Square Shuttle Run Test and 20-m Multistage Shuttle Run Test provide reliable results and can be used for estimation of performance progression across time.
References 1. Flouris AD, Klentrou P. The need for energy equilibrium. J Sci Med Sport 2005;8(2):129—33. 2. Flouris AD, Koutedakis Y, Nevill A, et al. Enhancing specificity in proxy-design for the assessment of bioenergetics. J Sci Med Sport 2004;7:197—204. 3. O’Gorman D, Hunter A, McDonnacha C, et al. Validity of field tests for evaluating endurance capacity in competitive and international-level sports participants. J Strength Cond Res 2000;14:62—7. 4. L´ eger L, Gadoury C. Validity of the 20 m shuttle run test with 1 min stages to predict VO2max in adults. Can J Sport Sci 1989;14:21—6. 5. Ahmaidi S, Collomp K, Caillaud C, et al. Maximal and functional aerobic capacity as assessed by two graduated field methods in comparison to laboratory exercise testing in moderately trained subjects. Int J Sports Med 1992;13:243—8. 6. L´ eger L, Mercier D, Gadoury C, et al. The multistage 20 metre shuttle run test for aerobic fitness. J Sports Sci 1988;6(2):93—101. 7. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1(8476):307—10. 8. Nevill AM, Atkinson G. Assessing agreement between measurements recorded on a ratio scale in sports medicine and sports science. Br J Sports Med 1997;31:314—8. 9. Algina J, Keselman H. Cross-validation sample sizes. Appl Psychol 2000;24:173—9. 10. Metsios GS, Flouris AD, Koutedakis Y, et al. The effect of performance feedback on cardiorespiratory fitness field tests. J Sci Med Sport 2006;9(3):263—6.
Available online at www.sciencedirect.com