Research in Developmental Disabilities 34 (2013) 234–245
Contents lists available at SciVerse ScienceDirect
Research in Developmental Disabilities
Validation of a questionnaire to measure mastery motivation among Chinese preschool children Cynthia Leung a,*, S.K. Lo b a b
Department of Applied Social Sciences, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong The Hong Kong Institute of Education, 10 Lo Ping Road, Tai Po, Hong Kong
A R T I C L E I N F O
A B S T R A C T
Article history: Received 9 June 2012 Accepted 23 July 2012 Available online 7 September 2012
The aim of this study was to validate a questionnaire on mastery motivation (task and effort) for use with Chinese preschool children in Hong Kong. A parent version and a teacher version were developed and evaluated. Participants included 457 children (230 boys and 227 girls) aged four and five years old, their preschool teachers and their parents. Further, 44 children (39 boys and 5 girls) with developmental disabilities were recruited. The children were assessed on the cognitive sub-test of the Preschool Development Assessment Scale (PDAS). Their parents completed the task and effort motivation scales, as well as the Strengths and Difficulties Questionnaire (SDQ). Their teachers also completed the task and effort motivation scales. Rasch analysis results provided support for the unidimensionality of the parent and teacher versions of the two motivation scales. The parent and teacher versions of the two motivation scales correlated positively with the PDAS cognitive sub-test and the SDQ prosocial scale scores, and negatively with the SDQ total problem behavior scores. Children with developmental disabilities were assigned lower scores by their teachers and parents on the two motivation scales, compared with children with typical development. Reliability (Cronbach’s Alpha) of the parent and teacher versions of the two motivation scales were above .70. The results suggested that the task and effort motivation scales were promising instruments for the assessment of motivation among Chinese preschool children. ß 2012 Elsevier Ltd. All rights reserved.
Keywords: Motivation Children Chinese Assessment
1. Introduction The impact of motivation on academic achievement and quality of life is well documented (Bello, Steffen, & Hayashi, 2011; McInerney & Ali, 2006). Grant and Dweck (2003) maintained that achievement motivation could best be understood in terms of achievement goals. McInerney, Marsh, and Yeung (2003) suggested that the setting of goals was the catalyst for directing people’s energy toward the achievement of intended outcomes. Schunk (2003) defined a goal as a standard by which people judged their success. According to Grant and Dweck (2003), there were two classes of goals, performance goals and learning (or mastery) goals. Performance goals were related to validation of one’s ability whereas learning goals focused on striving to learn and mastery. The focus of performance goal was on the self whereas the focus of learning or mastery goals was on the learning activity (Maehr, 2001). Learning or mastery goals were positively associated with intrinsic motivation and academic achievement. Mastery goals have also been found to be associated with self-efficacy, positive emotions and general well-being (Kaplan & Maehr, 2007). Some suggested that performance
* Corresponding author. Tel.: +852 2766 4670; fax: +852 2773 6558. E-mail address:
[email protected] (C. Leung). 0891-4222/$ – see front matter ß 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ridd.2012.07.023
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
235
goals could be divided into approach goals (achieving success) and avoidance goals (avoiding failure). While the association between performance-approach goals and outcomes were still under debate, performance-avoidance goals were negatively related to performance (Kaplan & Maehr, 2007). Kaplan and Maehr (2007) pointed out that the two-goal framework commonly used in the literature might not be adequate in describing student motivation. They suggested that extrinsic goals (getting extrinsic incentives) and social goals (pleasing others or opportunities for social interaction) were also relevant. Maehr (2008) also pointed out that the interpersonal nature of motivation should be taken into consideration. McInerney et al. (2003) developed their hierarchical, multidimensional model with ten goal orientations (‘‘effort, task, sense of purpose, recognition, competition, power, token rewards, social concern, affiliations and social dependence’’) (pp. 337–338) subsumed under three second-order factors, mastery, performance and social goals. In a later cross-cultural study involving high school students in Australia, Hong Kong, United States and Africa, a multidimensional, hierarchical motivation model was confirmed (McInerney & Ali, 2006). This model consisted of one third-order factor (general motivation) and four second-order factor, each with two specific firstorder factors. The second and first-order factors were mastery (task, effort), performance (competition, social power), social (affiliation, social concern) and extrinsic (praise, token). 1.1. Motivation in preschool children Most of the questionnaire measures of motivation were developed for college students, with some for high school or primary school students and most of the studies on motivation were based on these students (Kaplan & Maehr, 2007). However, studies on motivation among preschool children were more limited. The limited studies on motivation among preschool children showed that motivation was associated with learning and behavior, even at this early age. An earlier study found that preschool task orientation (as measured by teacher report questionnaires) positively predicted first grade reading skills (Salonen, Lepola, & Niemi, 1998). In a longitudinal study on lower primary school children, it was found that mathematics performance in first grade predicted subsequent mathematics task motivation which then predicted mathematics performance in second grade. In this study, an interview schedule with children was used to measure mathematics task motivation (Aunola, Leskinen, & Nurmi, 2006), but this study was limited to mathematics motivation only. In another study on preschool children, it was found that problem behavior was negatively associated with competence motivation, based on teacher-report questionnaires (Bulotsky-Shearer, Fernandez, Dominguez, & Rouse, 2011). In a study on preschool children in mainland China, Zhou and Salili (2008) asked parents to rate intrinsic motivation in reading in terms of children’s persistent and voluntary engagement in reading activities. However, the measures were related to reading behavior only and might not generalize to general academic activities. In this study, children’s reading motivation was associated with home literacy environment. In the above studies on preschool children, some motivation measures were on specific academic domains such as reading or mathematics, and might not generalize to learning in general. Others measures on general motivation were based on teacher report and there was no study on parent report. However, parent ratings have been shown to be valid for assessment of young children (Caselman & Self, 2008). The aim of this study was to validate a questionnaire on mastery motivation (task and effort) based on the Inventory of School Motivation (McInerney & Ali, 2006) for use with Chinese preschool children in Hong Kong. A parent version and a teacher version would be developed and evaluated. Parent and teacher ratings were regarded as cost-effective and valid methods for assessing young children (Glascoe, 2000; Koth, Bradshaw, & Leaf, 2009). The availability of parent and teacher report could provide for triangulation and better understanding of children’s behavior in different settings. An understanding of preschool children’s motivation could help identify children in need of learning support, and the evaluation of intervention programs targeting these children. The Inventory of School Motivation was chosen because it had been used with Hong Kong Chinese students. Only the two sub-scales on mastery motivation (task and effort) were used in this study as mastery motivation has consistently been shown to be associated with academic performance and general well-being. Another reason for focusing only on mastery motivation scales was to minimize the time demand on teachers and parents. 1.2. Calculation In terms of evidence-based assessment, the unidimensionality of items should be examined so as to justify the summing up of item scores to form a total score (Holmbeck & Devine, 2009). In this study, the unidimensionality of the Task and Effort Scales would be investigated through the examination of the infit and outfit mean square statistics. Differential item functioning (DIF) would also be conducted to examine item bias to test whether the items function differently with different groups (Sattler, 2008). The rating scale properties would be investigated to examine whether teachers and parents could meaningfully differentiate between the various rating points on these two scales. The convergent validity of the Task and Effort Scales, which is the correlation between the target measure and other related measures, would be examined in terms of the correlation between these two scales with a measure of cognitive skills (Cognitive sub-test of the Preschool Developmental Assessment Scale) and a measure of children’s behavior problems (Strengths and Difficulties Questionnaire). The criterion-related validity of the Task and Effort Scales, which is the association between these two scales and measures of non-test behaviors, would be examined by comparing the scores of children with
236
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
developmental disabilities and children with typical development. Internal consistency reliability would be examined in terms of Cronbach’s Alpha and test–retest reliability would be evaluated in terms of intra-class correlation (Yen & Lo, 2002). The hypotheses were: 1. There would be a positive correlation between the teacher and parent versions of the Task and Effort Scales with children’s cognitive development scores and prosocial behavior scores, and a negative correlation with problem behavior scores. 2. Children with typical development would be rated higher by their parents and teachers on the Task and Effort Scales, when compared with children with developmental disabilities.
2. Method 2.1. Participants The participants included 492 preschool children (245 girls and 247 boys), their teachers and parents. There were 242 4year-old children (120 girls and 122 boys) and 250 5-year-old children (125 girls and 125 boys). The inclusion criteria were: (i) these children and at least one of their parents were Hong Kong residents who normally resided in Hong Kong; (ii) they and their parents should be Cantonese-speaking (Cantonese being the language spoken by 90% of the Hong Kong population [Census and Statistics Department, 2007]) and; (iii) the children should be currently attending preschools. There were 457 participants (230 boys and 227 girls) with complete data (preschool). They included 221 4-year-old children (112 girls and 109 boys) and 236 5-year-old children (115 girls and 121 boys). For a correlation of .3, a sample size of 300 could provide 95% confidence intervals of 0.2. Besides the preschool group, there were 56 children in integrated programs (IP) for children with developmental disabilities (26 4-year-old children and 30 5-year-old children). Integrated programs in mainstream preschools were provided for children with developmental disabilities (e.g. developmental delay, autism). Children were admitted into these programs only after assessment by professionals such as pediatricians or psychologists. The inclusion criteria for the IP children were the same as those of the preschool children described. There were 22 4-year-old children (18 boys and 4 girls) and 22 5-year-old children (21 boys and 1 girl) with complete data. Assuming that IP children were at least one standard deviation below that of preschool children in measures of development, a sample size of 16 per age group was considered adequate (a = .05, power = .80) (Cohen, 1992). Based on the investigator’s contacts, preschools in all administrative districts (18 districts) of Hong Kong were approached. At least one preschool from each district was invited to participate, to ensure that all districts could be covered, and all preschools that agreed to participate were included. The number of preschools in each district ranged from one to six, with 20–32 children from each district. Preschools with integrated programs were also recruited from each of the 18 administrative districts and all preschools that consented to participate were included. The number of IP children per district ranged from one to five, with the exception of one district with 12 children. The number of preschools with integrated programs per district ranged from one to three. Unless otherwise stated, the analyses reported in the rest of this paper were based on these 501 participants with complete data (457 preschool children and 44 IP children). The demographic characteristics of participants with complete data are presented in Table 1. 2.2. Measures Effort Scale – this consisted of 7 items based on the 5-point effort scale of McInerney and Ali (2006). A 4-point scale of this version was used in a recent study with Chinese secondary school students in Hong Kong (Mok, Wong, & Lau, 2010) with satisfactory reliability (.84). In the present study, the format and wording were slightly changed from the original self-rating version to statements about children’s behavior to be rated by parents and teachers. Teachers and parents were requested to rate their students/children on each statement on a 4-point scale (1 = never, 2 = seldom, 3 = sometimes, 4 = always) following Mok et al. (2010). A mini pilot trial was conducted with four preschool teachers. They were each asked to think of three students who were above average, average, and below average in their class and to rate them on the Effort Scale. The results indicated a significant overall difference among the three groups (above average, average, and below average children), F(2, 9) = 8.16, p = .012, and the reliability was .97. Task Scale – this consisted of 4 items based on the Task Scale of McInerney and Ali (2006) which was a 5-point version. The reliability of the 4-point version used in the recent study with Chinese secondary school students in Hong Kong (Mok et al., 2010) was .78. The format and wording were changed from the original self-rating version to statements about children’s behavior to be rated by parents and teachers and the 4-point version of Mok et al. (2010) was used. Teachers and parents were requested to rate their students/children on each statement on a 4-point scale (1 = never, 2 = seldom, 3 = sometimes, 4 = always). The mini-pilot (see above) results again indicated overall difference among the three groups (above average, average, and below average children), F(2, 9) = 4.98, p = .035, and the reliability was .87. Strengths and Difficulties Questionnaire (SDQ) – this scale on children’s behavior consisted of five sub-scales (emotional problems, conduct problems, hyperactivity, peer relationship problems and prosocial behavior), each with five items. In the
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
237
Table 1 Demographic characteristics of participants.
Sex – boys Age – 4-year-old Born in Hong Kong Relationship with child – mother Relationship with child – father Relationship with child – others Family – nuclear Family – extended Family – reconstituted Family – others Marital status – married/de facto Marital status – single/widowed/separated/divorced Mother’s education – junior secondary or below Father’s education – junior secondary or below Mother’s employment – employed Father’s employment – employed Family income – HK$19,999 or belowa On social welfare Child’s length of residence in Hong Kong Mother’s length of residence in Hong Kong Father’s length of residence in Hong Kong Mother’s age Father’s age Number of children at home a
Preschool participants (n = 457)
Integrated program participants (n = 44)
Significance
230 (50.3%) 221 (48.4%) 420 (93.1%) 352 (77.7%) 91 (20.1%) 10 (2.2%) 288 (64.0%) 143 (31.8%) 3 (0.7%) 16 (3.6%) 410 (91.7%) 37 (8.3%) 138 (30.7%) 148 (33.3%) 199 (43.5%) 338 (74.0%) 208 (47.5%) 35 (7.90%) 4.74 (1.27) 22.09 (15.29) 34.83 (13.08) 35.74 (4.97) 40.77 (6.85) 1.79 (0.69)
39 (88.6%) 22 (50.0%) 43 (97.7%) 32 (72.7%) 10 (22.7%) 2 (4.5%) 35 (81.4%) 6 (14.0%) 1 (2.30%) 1 (2.30%) 38 (88.4%) 5 (11.6%) 16 (38.1%) 12 (30.0%) 19 (43.2%) 31 (70.5%) 27 (65.9%) 8(20.5%) 4.86 (0.89) 22.90 (16.40) 36.63 (14.10) 36.67 (4.92) 42.05 (7.13) 1.86 (0.84)
x2(1) = 23.69, p < .001 x2(1) = 0.04, p = .835 x2(1) = 1.40, p = .236 x2(2) = 1.17, p = .556 x2(3) = 7.44, p = .059
x2(1) = 0.56, p = .399 x2(1) = 0.97, p = .326 x2(1) = 0.18, p = .668 x2(1) = 0.002, p = .963 x2(1) = 0.25, p = .614 x2(1) = 5.06, p = .024 x2(1) = 7.02, p = .016 t(483) = 0.59, t(453) = 0.31, t(442) = 0.81, t(474) = 1.14, t(472) = 1.13, t(493) = 0.59,
p = .556 p = .754 p = .419 p = .255 p = .259 p = .556
The median monthly household income in Hong Kong is HK$17,250 (Census and Statistics Department, 2007).
present study, parents were requested to rate each item on a 3-point scale from 1 (not true) to 3 (certainly true). A high score indicated higher endorsement of the behavior domain. A total problem behavior score was calculated by summing up the raw scores of the first four sub-scales. The validity of the SDQ could be demonstrated by its high correlation with the Child Behavior Checklist and its ability to discriminate between psychiatric and dental cases (Goodman & Scott, 1999). The Chinese version of the scale has been validated by Lai et al. (2010), and used with Chinese preschool children in Hong Kong (Leung, Sanders, Leung, Mak, & Lau 2003). In the present study, the reliability estimates (Cronbach’s Alpha) were: .59 (emotional problems), .52 (conduct problems), .73 (hyperactivity), .46 (peer relationship problems), .69 (prosocial behavior) and .77 (total problem behavior). Cognitive sub-test of the Preschool Developmental Assessment Scale (PDAS) (Leung, Mak, Lau, Cheung, & Lam, 2010) – the PDAS, an individually administered test, was developed for Hong Kong Chinese preschool children aged between three and six years old. The cognitive sub-test consisted of 40 items on basic preschool concepts such as color, shape, and classification. Children responded by pointing to the correct answer or by making verbal responses. This scale could discriminate between children of different ages, and children with typical and atypical development, with a reliability of .93 (KR-20) (Leung et al., 2010). In the present study, the reliability estimate (KR-20) was .88. Demographic information – parents were requested to supply basic demographic information such as age, sex and length of residence in Hong Kong of target children, relationship of parent participant to target child, parents’ age and length of residence in Hong Kong, parents’ education, parents’ employment, marital status, family type, family income and social welfare status. 2.3. Procedures Preschools were approached by the investigators, based on their contacts. In each participating preschool, using the class list as the sampling frame, 4–12 children from each age group (4-year-old and 5-year-old) were selected randomly, using random numbers generated by a random number generator. Upon obtaining parent consent, the teachers of the selected children were requested to complete the Effort and Task Scales and the parents of the selected children completed the SDQ, the Effort and Task Scales. Trainee educational psychologists or research assistants, who were not connected with the preschools, administered the PDAS cognitive sub-test to the selected children. The investigators also approached preschools with integrated programs, based on their contacts. In each participating preschool, the parents of all 4- and 5-year-old children enrolled in the integrated program were contacted to request their consent to participate in the study. Upon obtaining parent consent, the teachers of the children were requested to complete the Effort and Task Scales and the parents of these children completed the SDQ, the Effort and Task Scales. Trainee educational psychologists or research assistants, who were not connected with the preschools, administered the cognitive sub-test of the PDAS to the selected children. There were 203 teachers who participated in this study. The mean number of questionnaires completed by each teacher was 2.47. The median and mode were both 2 (maximum = 9, minimum = 1).
238
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
To test for test–retest reliability, 84 parents and 82 teachers were approached approximately four weeks after their completion of questionnaires to complete the Task and Effort Scales again. This study was approved by the ethics committee of The Hong Kong Polytechnic University. 2.4. Statistical analysis Unidimensionality, DIF and category functioning of the Task and Effort Scales were examined using Rasch analysis. Independent t test, correlation and multiple regression were used in the investigation of the validity of the Task and Effort Scales. The softwares used included Winsteps 3.69.1 (Rasch analysis) and Statistical Package for the Social Sciences (version 19). 3. Results There was a difference in integrated program (IP) status between participants with complete and incomplete data,
x2(1) = 13.14, p = .001. There was a higher percentage of IP children with incomplete data. There were 12 (21.4%) IP children with incomplete data whereas the corresponding figure among the preschool children was 35 (7.1%). Besides, participants with complete and incomplete data differed in terms of PDAS cognitive sub-test scores, t(546) = 2.21, p = .028, teacher version of Effort Scale, t(545) = 2.08, p = .038, and parent version of Effort Scale, t(537) = 2.30, p = .022. The scores of participants with incomplete data were lower than those with complete data. Among participants with complete data, there were differences between IP children and preschool children, in terms of sex, family income, and social welfare status. There was a lower percentage of boys among the preschool group than the IP group. There was a higher percentage of IP children whose family’s monthly family income was HK$19,999 or below than preschool children. A higher percentage of IP children’s families were on social welfare, compared with preschool children. The details are presented in Table 1. There were also differences between 4-year-old and 5-year-old children, in terms of relationship of participant (person completing the parent-version questionnaires) to target child, x2(2) = 6.81, p = .033, maternal employment, x2(1) = 5.27, p = .002, and family income, x2(1) = 16.47, p < .001. Among the 5-year-olds, there were more mother participants (n = 207, 80.9%) than that among the 4-year-olds (n = 177, 73.4%). More mothers of 5-year-olds were in the workforce (n = 125, 48.4%), compared with those of 4-year-olds (n = 93, 38.3%). There were more families with monthly family income of HK$20,000 or above among the 5-year-olds (n = 147, 60.0%) than that among the 4-year-olds (n = 97, 41.5%). As the reliability estimates of SDQ emotional problems, conduct problems, and peer relationship problems sub-scales were not satisfactory (<.70), the total problem behavior score was used as a measure of disruptive behavior in the subsequent analyses reported. Though the reliability estimate of the prosocial behavior sub-scale was only .69, it was included in the analysis to provide an indicator of positive social behavior. 3.1. Effort Scale 3.1.1. Rasch analysis Unidimensionality of the teacher version of the Effort Scale was investigated through examination of the infit and outfit mean squares. Using the criteria of 0.60–1.40 (Bond & Fox, 2007), all items were within the recommended range. The mean infit mean square was 0.98 (SD = 0.13) and the mean outfit mean square was 0.95 (SD = 0.19). The person reliability and separation measures were .85 and 2.41. The item reliability and separation measures were .99 and 8.38. Principal component analysis (PCA) of the residuals indicated that the variance explained by the measure was 63.7% and the variance explained by items was 14.9%, which was less than two times the variance explained by the first contrast (9.8%). The eigen value of the first contrast was 1.9, indicating that less than two items were involved (Linacre, 2009). According to Linacre’s (2009), the eigen value of the largest unexplained contrast should be less than 3 for the measure to be unidimensional. For the rating scale properties, the average measures increased monotonically from 2.82 for category 1 to 6.88 for category 4. The step calibrations also increased monotonically and they were more than 1.4 logits apart. There were 5 items with less than 10 responses per category and these were all category 1 responses. The results supported the unidimensionality of the teacher version of the Effort Scale and provided justification for summing up the scores to form a total score. The results also indicated that teachers could meaningfully differentiate between the four categories. The item map (Fig. 1) indicated that the scale was relatively easy for the participants and there were not enough items to target participants at the higher end. Unidimensionality of the parent version of the Effort Scale was investigated through examination of the infit and outfit mean squares. Using the criteria of 0.60–1.40 (Bond & Fox, 2007), there was one item with infit (1.52) and outfit (1.81) statistics outside the recommended range (item 1 – willing to work a long time at school work that is interesting). The mean infit mean square was 1.01 (SD = 0.22) and the mean outfit mean square was 1.03 (SD = 0.33). The person reliability and separation measures were .81 and 2.09. The item reliability and separation measures were .99 and 10.60. Principal component analysis (PCA) of the residuals indicated that the variance explained by the measure was 61.4% and the variance explained by items was 20.7%, which was 2.4 times the variance explained by the first contrast (8.6%). The eigen value of the first contrast was 1.6. For the rating scale properties, the average measures increased monotonically from 3.31 for category 1 to 5.62 for category 4. The step calibrations also increased monotonically and they were more
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
239
Fig. 1. Item map for teacher version of Effort Scale.
than 1.4 logits apart. There were 6 items with less than 10 responses per category and these were mainly categories 1 and 2 responses. The results provided some support for the unidimensionality of the parent version of the Effort Scale and justification for summing up the scores to form a total score. The results also indicated that parents could meaningfully differentiate between the four categories. The item map (Fig. 2) indicated that the scale was relatively easy for the participants and there were not enough items to target participants in the higher range. For DIF, with the parent version of the Effort Scale, taking into consideration multiple comparison correction and the requirement that the DIF contrast should be above .50, there was no difference for sex or IP status. For age difference, item 7 (always try to do better in school work) was more difficult for the 5-year-old group. For the teacher version, there was no item with item bias for sex, age or IP status. The correlation between the raw scores and logit scores of the teacher version was .99 (n = 501). The correlation between the raw scores and logit scores of the parent version was also .99 (n = 501). These suggested that the logit scores were meaningful representations of the raw scores. 3.1.2. Reliability The reliability of the parent version was .89 and the reliability of the teacher version was .92. Test–retest reliability (intraclass correlation) for the teacher version was .61 (95%CI: .45, .73) (n = 82) and that for the parent version was .69 (95%CI: .55, .78) (n = 84). 3.1.3. Validity Though the fit statistics of one item in the parent version were less satisfactory, it was decided to retain this item so the parent and teacher version could be consistent. Furthermore, the other Rasch analysis results of this original parent version were satisfactory. In terms of its validity, the Effort Scale (both parent and teacher versions) correlated positively with the PDAS cognitive sub-test and SDQ prosocial behavior scores and negatively with SDQ total problem behavior score. Children who were rated by their teachers and parents as having higher effort motivation achieved higher scores on the PDAS cognitive sub-test items, and were rated by their parents as more prosocial and less disruptive in behavior. The details are in Table 2. Separate analyses by age group resulted in the same pattern except that for 5-year-old children, the correlation between SDQ prosocial behavior and the teacher version of the Effort Scale was not significant (r = .10, p = .097, n = 258). The results were similar when using logit scores. Validity was further investigated by examining the difference between the IP group and the preschool group. Independent t test results indicated a significant difference between the two groups on both the parent and teacher versions
240
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
Fig. 2. Item map for parent version of Effort Scale.
Table 2 Correlation between Effort and Task Scales with PDAS and SDQ (n = 501).
Effort (teacher) Effort (parent) Task (teacher) Task (parent)
PDAS cognitive sub-test
SDQ prosocial behavior
.38** .26** .35** .28**
.20** .45** .20** .43**
SDQ total problem behavior .32** .46** .25** .37**
** p < .001.
of the Effort Scale. The scores of IP students were lower than those of preschool students. The results were similar with separate analyses by age group, with the exception of the parent version of Effort Scale for 4-year-old children. The details are in Table 3. The results were similar when using logit scores. As the IP and preschool children differed in terms of family income, social welfare status and sex, multiple regression was performed to examine the effect of IP status on Effort Scale scores, after controlling for family income, social welfare status, and sex. The results were significant for the parent version, F(4, 462) = 6.23, p < .001, and the teacher version, F(4, 462) = 12.67, p < .001. The effect of IP status was significant for the parent version (b = .13, t = 2.77, p < .006) and the teacher version (b = .24, t = 5.38, p < .001) after controlling for family income, social welfare status, and sex. The results were similar with separate analyses by age group except that IP status was not significant for the 4-year-old group for the parent version of the Effort Scale. The results were similar when using logit scores. There was a significant sex difference for both the parent version, t(499) = 3.72, p < .001, and teacher version of the Effort Scale, t(499) = 4.02, p < .001. The scores of girls were higher than boys both in the teacher and parent versions. For the teacher version, the mean scores of girls and boys were 24.93 (95%CI: 24.51, 25.35) and 23.62 (95%CI: 23.15, 24.10). For the parent version, the mean scores of girls and boys were 24.27 (95%CI: 23.83, 24.71) and 23.10 (95%CI: 22.66, 23.53). There was also a significant age difference for the teacher version of the Effort Scale, t(499) = 2.76, p = .001. The scores of 5-year-old children (M = 24.67, 95%CI: 24.25, 25.08) were higher than those of 4-year-old children (M = 23.77, 95%CI: 23.27, 24.26). 3.2. Task Scale 3.2.1. Rasch analysis Unidimensionality of the teacher version of the Task Scale was investigated through examination of the infit and outfit mean squares. All items were within the recommended range of 0.60–1.40 (Bond & Fox, 2007). The mean infit mean square
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
241
Table 3 Mean and 95% confidence interval scores.
4-Year-old children Effort (parent) Effort (teacher) Task (parent) Task (teacher)
5-Year-old children Effort (parent) Effort (teacher) Task (parent) Task (teacher)
Total Effort (parent) Effort (teacher) Task (parent) Task (teacher)
Preschool children
IP children
Significance
n = 221 23.46 [22.99, 23.93] 24.18 [23.69, 24.67] 13.97 [13.70, 14.24] 13.98 [13.70, 14.25]
n = 22 22.50 [20.82, 19.64 [17.99, 13.32 [12.36, 11.86 [10.78,
t(241) = 1.21, p = .228, d = 0.27 t(241) = 5.48, p < .001, d = 1.23 t(241) = 1.42, p = .156, d = 0.32 t(241) = 4.50, p < .001, d = 1.01
n = 236 24.12 [23.69, 24.55] 24.86 [24.43, 25.28] 14.33 [14.11, 14.54] 14.50 [14.28, 14.73]
n = 22 21.41 [19.45, 22.64 [21.07, 12.55 [11.44, 13.45 [12.71,
n = 457 23.80 [23.48, 24.12] 24.53 [24.20, 24.85] 14.16 [13.98, 14.33] 14.25 [14.07, 14.43]
n = 44 21.95 [20.71, 21.14 [19.95, 12.93 [12.22, 12.66 [11.98,
24.18] 21.29] 14.28] 12.94]
23.37] 24.21] 13.65] 14.20]
23.20] 22.32] 13.65] 13.33]
t(256) = 3.51, p = .001, d = 0.72 t(256) = 2.98, p = .003, d = 0.67 t(256) = 4.53, p < .001, d = 1.01 t(256) = 2.65, p = .009, d = 0.59
t(499) = 3.32, p = .001, d = 0.53 t(499) = 6.04, p < .001, d = 0.95 t(499) = 4.04, p < .001, d = 0.64 t(499) = 5.12, p < .001, d = 0.81
was 0.96 (SD = 0.14) and the mean outfit mean square was 0.97 (SD = 0.26). The person reliability and separation measures were .72 and 1.59. The item reliability and separation measures were .99 and 10.15. Principal component analysis (PCA) of the residuals indicated that the variance explained by the measure was 63.0% and the variance explained by items was 22.6%, which was less than two times the variance explained by the first contrast (14.5%). The eigen value of the first contrast was 1.6, indicating that less than two items were involved. For the rating scale properties, the average measures increased monotonically from 0.53 for category 1 to 7.00 for category 4, but there was a violation for category 2 ( 0.91). This was probably due to the score of one participant on item 1 (the only participant choosing category 1 on this item) whose rating on this item was totally different from the rating of the other three items. The step calibrations also increased monotonically and they were more than 1.4 logits apart. There were 2 items with less than 10 responses per category and these were all category 1 responses. The results supported the unidimensionality of the teacher version of the Task Scale and provided justification for summing up the scores to form a total score. The results also indicated that teachers could meaningfully differentiate between the four categories. The item map (Fig. 3) indicated that the scale was relatively easy for the participants and there were not enough items to target participants in the higher range. Unidimensionality of the parent version of the Task Scale was investigated through examination of the infit and outfit mean squares. Using the criteria of 0.60–1.40 (Bond & Fox, 2007), there was one item with outfit (0.59) statistics just outside the recommended range (item 3 – like to see improvement in own school work). Bond and Fox (2007) suggested that relatively more weight should be given to infit statistics as the outfit statistics were not weighted and were more sensitive to outliers. The mean infit mean square was 0.99 (SD = 0.12) and the mean outfit mean square was 0.94 (SD = 0.20). The person reliability and separation measures were .56 and 1.13. The item reliability and separation measures were .99 and 11.59. Principal component analysis (PCA) of the residuals indicated that the variance explained by the measure was 59.1% and the variance explained by items was 24.4%, which was less than two times the variance explained by the first contrast (16.9%). The eigen value of the first contrast was 1.7, indicating that less than two items were involved. For the rating scale properties, the average measures increased monotonically from 2.70 for category 1 to 5.37 for category 4. The step calibrations also increased monotonically and they were more than 1.4 logits apart. There were 3 items with less than 10 responses per category for category 1. The results provided some support for the unidimensionality of the parent version of the Task Scale and justification for summing up the scores to form a total score. The results also indicated that parents could meaningfully differentiate between the four categories. The item map (Fig. 4) indicated that the scale was relatively easy for the participants and there were not enough items to target participants at the higher end. In terms of DIF, for the parent version of the Task Scale, there was no item with item bias by sex, age or IP status. For the teacher version, there was a significant item contrast for item 4 (need to know that he/she is getting somewhere with school work) which was more difficult for IP children. There was no item bias for sex and age level.
242
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
Fig. 3. Item map for teacher version of Task Scale.
The correlation between the raw scores and logit scores of the teacher version was .99 (n = 501). The correlation between the raw scores and logit scores of the parent version was also .99 (n = 501). These suggested that the logit scores were meaningful representations of the raw scores. 3.2.2. Reliability The reliability of the parent version was .77 and the reliability of the teacher version was .85. Test retest reliability (intraclass correlation) for the teacher version was .57 (95%CI: .40, .70) (n = 82) and that for the parent version was .54 (95%CI: .37, .68) (n = 84). 3.2.3. Validity In terms of its validity, the Task Scale (both parent and teacher versions) correlated positively with the PDAS cognitive sub-test and SDQ prosocial behavior scores and negatively with SDQ total problem behavior score. Children who were rated by their teachers and parents as having higher task motivation achieved higher scores on the PDAS cognitive sub-test items, and were rated by their parents as more prosocial and less disruptive in behavior. The details are in Table 2. Separate analyses by age group resulted in the same pattern except that for five-year-old children, the correlation between SDQ prosocial behavior and the teacher version of the Task Scale was not significant (r = .09, p = .133, n = 258). The results were similar when using logit scores. Validity was further investigated by examining the difference between the IP group and the preschool group. Independent t test results indicated a significant difference between the two groups on both the parent and teacher version of the Task Scale. The scores of IP students were lower than those of preschool students. The results were similar with separate analyses by age group, with the exception of the parent version of Task Scale for 4-year-old children. The details are in Table 3. The results were similar when using logit scores. As the IP and preschool children differed in terms of family income, social welfare status, and sex, multiple regression was performed to examine the effect of IP status on Task Scale scores, after controlling for family income, social welfare status, and sex. The results were significant for the parent version, F(4, 462) = 7.46, p < .001, and the teacher version, F(4, 462) = 9.07, p < .001. The effect of IP status was significant for the parent version (b = .16, t = 3.50, p = .001) and teacher version (b = .23, t = 4.92, p < .001) after controlling for family income, social welfare status, and sex. The results were similar with separate analyses by age group except that IP status was not significant for the 4-year-old group for the parent version of the Task Scale. The results were similar when using logit scores.
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
243
Fig. 4. Item map for parent version of Task Scale.
There was a significant sex difference for both the parent version, t(499) = 3.48, p = .001, and teacher version of the Task Scale, t(499) = 3.16, p = .002. The scores of girls were higher than boys both in the teacher and parent versions. For the teacher version, the mean scores of girls and boys were 14.41 (95%CI: 14.18, 14.65) and 13.85 (95%CI: 13.59, 14.10). For the parent version, the mean scores of girls and boys were 14.37 (95%CI: 14.14, 14.60) and 13.77 (95%CI: 13.52, 14.02). There was also a significant age difference for the teacher version of the Task Scale, t(499) = 3.52, p = .001. There were also significant sex differences for child behavior with girls having higher scores on prosocial behavior, t(499) = 4.85, p < .001, and lower scores on problem behavior, t(499) = 3.49, p = .001. There was no age difference for prosocial or problem behavior. 3.3. Comparison between the parent and teacher version The correlation between the parent and teacher version of the Task Scale was .27 (p < .001, n = 501) and that for Effort Scale was .30 (p < .001, n = 501). The correlations between teacher rating (Task Scale and Effort Scale) and PDAS scores (r = .35 for Task Scale and r = .38 for Effort Scale) were higher than that for parents (r = .28 for Task Scale and r = .26 for Effort Scale). On the other hand, the parent ratings correlated more highly with SDQ scores (r = .37 to .46) than the correlation between teacher ratings and SDQ scores (r = .32 to .20). Dependent t test indicated that teachers rated children more positively on the Effort Scale than parents, t(500) = 3.09, p = .002. However, there was no difference in parents’ and teachers’ ratings on the Task Scale. Separate analyses by age indicated that there was no difference in teachers’ and parents’ rating of the Effort and Task Scales among the 4-year-olds but teachers gave higher ratings on the Effort Scale (M = 24.67, 95%CI: 24.25; 25.08) than parents (M = 23.89, 95%CI: 23.45; 24.32) for 5-year-old children, t(257) = 3.06, p = .002. For separate analyses by sex, there was no difference in teachers’ and parents’ ratings for boys but for girls, teachers (M = 24.93, 95%CI: 24.517; 25.35) gave higher ratings on the Effort Scale than parents (M = 24.27, 95%CI: 23.83; 24.71), t(231) = 2.56, p = .011. 4. Discussion For convergent validity, it was hypothesized that the Effort and Task Scales should correlate with measures on children’s cognitive skills and social behavior. With regard to criterion-related validity, these two scales should be able to discriminate between children with typical development and children with developmental disabilities.
244
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
In terms of convergent validity, hypothesis one was supported. Both the parent and teacher versions of the Effort and Task Scales correlated positively with cognitive measures (PDAS cognitive sub-test scores), prosocial behavior, and negatively with problem behavior. The results were consistent with the literature demonstrating that mastery motivation was associated with academic achievement and well-being (Kaplan & Maehr, 2007). In terms of criterion-related validity, hypothesis two was also supported. Children with developmental disabilities were rated lower on the Effort and Task Scales by their teachers and parents, compared with children with typical development. These results provided evidence that the Task and Effort Scales were promising instruments for identification of children who might need support in learning, and as general measures of mastery motivation in preschool children. Rasch analysis results generally supported the unidimensionality of the Effort and Task Scales, providing justification for adding up the individual item scores to form a total score. The results on category functioning analysis also suggested that teachers and parents were able to meaningfully differentiate between the rating scale categories. DIF results indicated that item bias was not evident in most cases. These two scales, however, were relatively easy for Hong Kong preschool children and there were not enough items targeting the higher range of ability. The implication is that the effort and motivation scales are likely to be useful for understanding children at the lower end of the ability range, who might need support in their learning. The reliability (Cronbach’s Alpha) of the Effort and Task Scales were both above .70, indicating satisfactory reliability. Test–retest reliability was around the .50–.60 range. According to Sattler (2008), reliability estimates above .70 are regarded as moderate or fair and estimates above .80 are considered good (Sattler, 2008). For intra-class correlation, the general principle is that the higher the intra-class correlation, the better the reliability (Yen & Lo, 2002). The reliability estimates of the Task and Effort Scales are considered satisfactory. Teachers’ ratings were more closely associated with children’s cognitive development scores whereas parents’ ratings were more closely associated with children’s behavior or general well-being scores. Teachers might have more knowledge about children’s cognitive development and they also had the opportunity to observe children with a range of abilities. As the parents had to complete the SDQ and the Task and Effort Scales, the correlations among them were higher. However, it should be noted that both teachers’ and parents’ ratings correlated with the PDAS cognitive sub-test, which was individually administered to children by trainee educational psychologists/research assistants not associated with the preschools. Teachers tended to give higher ratings on the Effort Scale than parents. It is possible that teachers were more aware of the range of abilities of the children and had more opportunities to observe children’s learning behavior, so they were more appreciative of the effort paid by students irrespective of the learning outcomes. Another possible explanation is that children might behave differently in school and at home. The sex difference in motivation was consistent with the findings of Lepola (2004) and Yeung, Lau, and Nie (2011). It is possible that teachers’ or parents’ rating of motivation might be affected by children’s behavior (Lepola, 2004) and girls displayed higher scores on prosocial behavior and lower scores on problem behavior. The age difference in motivation was also consistent with Lepola (2004) who argued that this could also be interpreted as an increase in self-regulated activities, which were part of the preschool curriculum. There were some limitations to the present study. First, random sampling was not used though an effort was made to include preschools in all districts of Hong Kong. When compared with the 2006 bi-census, there were more families with income below the population median household income in the present sample (Census and Statistics Department, 2007). This should be taken into consideration in the interpretation of the results. Second, there were fewer girls in the IP group and this might have biased the results. Multiple regression was used to control for the confounding variables and the results still indicated a significant effect for developmental status. Third, test retest reliability was less satisfactory and the parents and teachers who participated in the retest phase were only convenience samples. Fourth, predictive validity and discriminant validity were not assessed.
5. Conclusion The results provided encouraging support for the parent and teacher versions of the Task and Effort Scale as measurement of motivation among Chinese preschool children. Their psychometric properties are satisfactory, in terms of reliability, validity, unidimensionality, item bias and category functioning. They are relatively short and easy to complete for teachers and parents of preschool children. They could potentially be useful tools for researchers on the learning of preschool children, and tools for the evaluation of early intervention programs. The availability of the parent and teacher versions provides flexibility for researchers, depending on their constraints in access to parents or teachers. The two versions could also provide researchers with observations from different parties, as a means of triangulation. References Aunola, A., Leskinen, E., & Nurmi, J. E. (2006). Developmental dynamics between mathematical performance, task motivation and teachers’ goals during the transition to primary school. British Journal of Educational Psychology, 76, 21–40. Bello, I., Steffen, J. J., & Hayashi, K. (2011). Cognitive motivational systems and life satisfaction in serious and persistent mental illness. Quality of Life Research, 20, 1061–1069. Bond, T., & Fox, C. (2007). Applying the Rasch model (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates, Publishers.
C. Leung, S.K. Lo / Research in Developmental Disabilities 34 (2013) 234–245
245
Bulotsky-Shearer, R. J., Fernandez, V., Dominguez, X., & Rouse, H. L. (2011). Behavior problems in learning activities and social interactions in Head Start classrooms and early reading, mathematics, and approaches to learning. School Psychology Review, 40, 39–56. Caselman, T. D., & Self, P. A. (2008). Assessment instruments for measuring young children’s social–emotional behavioral development. Children & Schools, 30, 103–115. Census and Statistics Department. (2007). 2006 population by-census main report: Volume 1. Hong Kong: Census and Statistics Department, Hong Kong SAR Government. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159. Glascoe, F. P. (2000). Evidence-based approach to developmental and behavioural surveillance using parents’ concerns. Child: Care, Health and Development, 26, 137–149. Goodman, R., & Scott, S. (1999). Comparing the Strengths and Difficulties Questionnaire and the Child Behavior Checklist: Is small beautiful? Journal of Abnormal Child Psychology, 27, 17–24. Grant, H., & Dweck, C. S. (2003). Clarifying achievement goals and their impact. Journal of Personality and Social Psychology, 85, 541–553. Holmbeck, G. N., & Devine, K. A. (2009). Editorial: An author’s checklist for measure development and validation manuscripts. Journal of Pediatric Psychology, 34, 691–696. Kaplan, A., & Maehr, M. L. (2007). The contributions and prospects of goal orientation theory. Educational Psychology Review, 19, 141–184. Koth, C., Bradshaw, C. P., & Leaf, P. J. (2009). Teacher observation of classroom adaptation-checklist: Development and factor structure. Measurement and Evaluation in Counseling and Development, 42, 15–30. Lai, K. Y. C., Luk, E. S. L., Leung, P. W. L., Wong, A. S. Y., Law, L., & Ho, K. (2010). Validation of the Chinese version of the Strengths and Difficulties Questionnaire in Hong Kong. Social Psychiatry and Psychiatric Epidemiology, 45, 1179–1186. Lepola, J. (2004). The role of gender and reading competence in the development of motivational orientation from kindergarten to grade 1. Early Education and Development, 15, 215–240. Leung, C., Mak, R., Lau, V., Cheung, J., & Lam, C. (2010). Development of a preschool developmental assessment scale for assessment of developmental disabilities. Research in Developmental Disabilities, 31, 1358–1365. Leung, C., Sanders, M. R., Leung, S., Mak, R., & Lau, J. (2003). An outcome evaluation of the implementation of the triple p-positive parenting program in Hong Kong. Family Process, 42, 531–544. Linacre, J. M. (2009). A user’s guide to Winsteps. Winstepscom. Maehr, M. L. (2001). Goal theory is not dead – Not yet anyway: A reflection on the special issue. Educational Psychology Review, 13, 177–185. Maehr, M. L. (2008). Culture and achievement motivation. International Journal of Psychology, 43, 917–918. McInerney, D. M., & Ali, J. (2006). Multidimensional and hierarchical assessment of school motivation: Cross-cultural validation. Educational Psychology, 26, 717–734. McInerney, D. M., Marsh, H. W., & Yeung, A. S. (2003). Toward a hierarchical goal theory model of school motivation. Journal of Applied Measurement, 4, 335–357. Mok, M. M. C., Wong, M. Y. W., & Lau, A. S. M. (2010). Norming report on the revised APASO for measuring affective and social outcomes of schooling. Hong Kong: Assessment Research Centre, The Hong Kong Institute of Education. Salonen, P., Lepola, J., & Niemi, P. (1998). The development of first graders’ reading skill as a function of pre-school motivational orientation and phonemic awareness. European Journal of Psychology of Education, 13, 155–174. Sattler, J. M. (2008). Assessment of children: Cognitive foundations (5th ed.). San Diego: Jerome M Sattler Publisher Inc. Schunk, D. H. (2003). Self-efficacy for reading and writing: Influence of modeling, goal setting, and self-evaluation. Reading & Writing Quarterly, 19, 159–172. Yen, M., & Lo, L. H. (2002). Examining test retest reliability: An intra-class correlation approach. Nursing Research, 51, 59–62. Yeung, A. S., Lau, S., & Nie, Y. (2011). Primary and secondary students’ motivation in learning English: Grade and gender differences. Contemporary Educational Psychology, 36, 246–256. Zhou, H., & Salili, F. (2008). Intrinsic motivation of Chinese preschoolers and its relationship with home literacy. International Journal of Psychology, 43, 912–916.