Measuring authoritative teaching

Measuring authoritative teaching

Teaching and Teacher Education 27 (2011) 51e61 Contents lists available at ScienceDirect Teaching and Teacher Education journal homepage: www.elsevi...

295KB Sizes 0 Downloads 47 Views

Teaching and Teacher Education 27 (2011) 51e61

Contents lists available at ScienceDirect

Teaching and Teacher Education journal homepage: www.elsevier.com/locate/tate

Measuring authoritative teaching Sigrun K. Ertesvåg* Centre for Behavioural Research, University of Stavanger, 4036 Stavanger, Norway

a r t i c l e i n f o

a b s t r a c t

Article history: Received 13 February 2010 Received in revised form 2 July 2010 Accepted 5 July 2010

High quality measurements are important to evaluate interventions. The study reports on the development of a measurement to investigate authoritative teaching understood as a two-dimensional construct of warmth and control. Through the application of confirmatory factor analysis (CFA) and structural equation modelling (SEM) the factor structure and measurement invariance is investigated. Generally, results suggest that the two-dimensional model of authoritative teaching has satisfactory psychometric properties for longitudinal measurement invariance, ensuring the measurement of the same concept over time. Different types of missing data in this study are discussed. Also, the relevance of such study for professional development is addressed. Ó 2010 Elsevier Ltd. All rights reserved.

Keywords: Authoritative teaching Teachers Confirmatory factor analysis Longitudinal Factorial invariance

1. Introduction Managing behaviour in class is one of the greatest challenges facing teachers world wide (e.g. Kim, Stormont, & Espinosa, 2009; Lewis, 2006; Loizou, 2009; Manning & Bucher, 2003; Midthassel, 2006; Sokal, Smith, & Mowat, 2003; van Tartwijk, den Brok, Veldman, & Wubbels, 2009). This applies equally to beginning and more experienced teachers (Evertson & Weinstein, 2006). Also, creating a positive learning environment is of great concern for teachers in many countries as pupils’ relationships with their teachers have been shown to be important predictors of academic and social adjustment (e.g. Hamre & Pianta, 2005; Hughes, 2002; Hui & Sun, 2010; Roland & Galloway, 2002). Managing behaviour in class is often neglected in teacher preparation programs in many countries (Evertson & Weinstein, 2006). As a response to this internationally acknowledged problem, there has been an increasing focus from the government, researchers and schools in Norway on improving teachers’ skills in managing behaviour and classroom settings. As a result a proliferation of actions to improve teachers’ skills has been employed, many based on intuitive appeal rather than systematic evidence. However, over the last few years, the pressure to document the effects of initiatives implemented in schools has increased from schools as well as the government (Nordahl et al., 2006). As a consequence there is an increasing craving for knowledge in schools to enable practitioners to judge the quality of an intervention and its evaluation.

* Tel.: þ47 51832900; fax: þ47 51832950. E-mail address: [email protected]. 0742-051X/$ e see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.tate.2010.07.002

Also, there is a growing need for teacher educators to emphasise the quality of the teacherepupil relationship and underscore the knowledge needed to interpret evaluation reports. Moreover, measurements for the longitudinal investigation of authoritative teaching are needed to investigate the development of such teaching style in schools in general. For example it would be interesting to know if, as a result of a stronger focus on authoritative teaching in teacher education and in-service training in Norway, teachers consider themselves to manage such teaching style better in ten years than today. This information may provide the government as well as teacher educators with evidence of their success in improving schools in this field. Also, it is important to monitor change when intervening to improve authoritative teaching in a school or group of schools. To be able to investigate authoritative teaching, short- and long term, measurements are needed that are grounded in sound theoretical and empirical research and are stable over time. The latter implies to ensure that the measurement actually investigates the same concept, here authoritative teaching. Otherwise we are in danger of measuring different concepts at different occasions and not changes in authoritative teaching. As part of the increased focus on evaluation, several studies have reported pupils’ self-reports of improvement regarding misbehaviour and pupils’ reports of teachers support and monitoring (e.g. Ertesvåg, 2009; Olweus, 2004). However, although previous longitudinal studies have addressed related concepts (e.g. Brekelmans, Wubbels, & van Tartwijk, 2005; Pianta, Belsky, Vandergrift, Houts, & Morrison, 2008) teachers’ self-reports of authoritative teaching are scarce, and few if any are longitudinal. This article discusses the development of a measurement to investigate the improvement of teachers’ authoritative teaching.

52

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

One way of conceptualizing relevant, selected, aspects of authoritative teaching understood as a two-dimensional concept constituting warmth and control is discussed. Also, measurement of teachers’ self-reports of warmth and control is investigated, both in a cross sectional and in a longitudinal sample attending one out of two school-wide interventions. Considerable evidence supports the beneficial effect of a close teacherepupil relationship on children’s academic and behavioural outcomes (Ertesvåg, 2009 for overview). The measurement developed emphasises authoritative teaching, as this teacherepupil interaction model serves as an underpinning for the other intervention characteristics of several of the evidence based interventions offered to Norwegian schools (Nordahl et al., 2006). Although as part of an evaluation this measurement may appear simple and easy to use and may be relatively easy to interpret, a series of statistical procedures were used to develop and validate it. These procedures are reported here in detail so practitioners as well as teacher educators can be aware of its strong theoretical as well as empirical base and use it with confidence to evaluate teachers’ authoritative teaching generally and specifically when evaluating improvement efforts. Training in authoritative teaching has the potential of improving classroom practice and teachers’ ability to manage classroom behaviour. However, without the knowledge of the concept as well as sound instruments to measure improvement, schools are in danger of implementing actions with little, if any, knowledge of their effects. 1.1. Authoritative parent and teacher style Baumrind (1991) used the dimensions of control/demandingness (control), and warmth/responsiveness (warmth) to derive a four-fold classification of parenting styles (Fig. 1). In the authoritarian style (high control/low warmth), parents place greater value on obedience and discipline. In the authoritative style (high control and warmth), parents set rules but are willing to explain the reasons for rules and are open to discussion. The permissive indulgent parent (low control, high warmth) has a lax attitude towards parenting and/or fails to provide rules for the child’s behaviour. Permissive neglectful parents (low control and warmth) do not structure and monitor, are not supportive, or may be actively rejecting. Baumrind (1967, 1991) found that children whose parents have an authoritative style have the best outcome on a number of behavioural and psychological measurements. Authoritative parents are involved with their children, providing close supervision and setting and enforcing limits on their behaviour. Yet, this control orientation is combined with acceptance, respect for autonomy, and warmth. Baumrinds’ typologies have had a tremendous impact on research on parenting style (see Hughes, 2002; Maccoby, 1992; Pellerin, 2005 for overview).

Control High

Low

High Authoritative

Indulgent

Warmth

Low

Authoritarian

Neglectful

Fig. 1. Typology of parenting and teaching styles (adapted from Baumrind, 1991).

Baumrinds’ typology implies four different parent styles. When applied to teaching, the authoritative style has been the main focus. Recently, a growing body of research (e.g. Hughes, 2002; Patrick, Turner, Meyer, & Midgley, 2005; Pellerin, 2005; Roland & Galloway, 2002) has emphasised the importance of the authoritative teacher in positive teacherepupil relationships and Baumrinds’ authoritative parenting style has been adapted to teaching style (Baker, Clark, Crowl, & Carlson, 2009; Connor, Son, Hindman, & Morrison, 2005; Hughes, 2002; Wentzel, 2002). Authoritative teachers work to build relationships of warmth, acceptance and openness; they establish high standards and have high expectations of socially responsible behaviour; they enforce rules and standards in a firm and consistent manner while using reprimands and punitive strategies when necessary; and they promote autonomy by encouraging the pupil’s participation in decisions about his/her behaviour (Brophy, 1996; Hughes, 2002; Kounin, 1970). This combination aims at preventing problems and also has the dual purpose of managing behaviour in the short term and developing responsibility among pupils in the long term (Hughes, 2002; Pellerin, 2005). It can be argued that developing a measurement to investigate the concept of authoritative teaching through the two aspects of warmth and control, is in fact a study of all four categories derived from Baumrinds’ typology. On the other hand, when focusing on teacher training authoritative teaching is the most interesting considering the positive outcome it has on pupils. However, a theoretical discussion of teacher styles will call for a broader discussion of the relationship between warmth and control. In this study the authoritative perspective is the main focus of the measurements developed, as the authoritative teacher style is the focus of several improvement initiatives. The scales developed will emphasise to identify the authoritative style. Given this, the scales are expected to correlate as authoritative teaching style is characterised by high warmth and high control. Although relatively few studies use the term ‘authoritative teaching’, the concept of warm demanders as an optimal teaching style is not new (Kleinfeld, 1975; Morrison, 1974; Walker, 2008), and recent research continues to relate the processes of warmth/ responsiveness and control/demandingness to pupil outcomes, explicitly (Connor et al., 2005; Hughes, 2002; Walker, 2008; Wentzel, 2002) and implicitly (e.g. Cornelius-White, 2007; Hamre & Pianta, 2005; Hughes & Kwok, 2006). Teacherepupil relationships characterised by warmth and a commitment to the pupil’s behaviour and learning may affect the pupil’s achievement and social adjustment (e.g. Hamre & Pianta, 2005; Hughes, 2002; Wentzel, 2002). Similarly, authoritative teachers who monitor pupils’ behaviour and learning may buffer children from negative peer influences or a negative social background (Hughes, 2002). Careful monitoring of school work and behaviour seems to prevent or reduce behavioural problems (e.g. Good & Brophy, 2007). Doyle (1986) notes that monitoring individual progress can afford opportunities for corrective feedback, and that the teacher’s close proximity to the pupils can prevent misbehaviour starting. Bru, Stephens, and Torsheim (2002) found that pupils’ perception of emotional support was significantly and positively correlated with both academic support and monitoring. Teachers contribute to motivation for high achievement, greater school engagement and wellbeing among their pupils if they are supportive, responsive to their pupils’ needs and set and reinforce clearly defined standards for behaviour and achievement (Marchant, Paulson, & Rothlisberg, 2001; Roeser, Midgley, & Urdan, 1996). On the other hand, if the teacher is too demanding, pupils are likely to feel uncomfortable and anxious and to concentrate less in class because of excessive levels of task orientation and control (Kuntsche, Gmel, & Rehm, 2006; Moos, 1978).

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

As part of examination of construct validity warmth and control are related to other aspects of teaching. Based on research on overlapping, but not identical concepts, as well as theory, it is reasonable to assume that collaboration and teacher certainty are positively correlated with the two aspects of authoritative teaching. Teacher collaboration has been found to improve teaching effectiveness (e.g. Goddard, Goddard, & Tschannen-Moran, 2007; Graham, 2007), and is presumed to be a powerful learning environment for teachers’ professional development (Meirink, Meijer, & Verloop, 2007). Moreover, a growing body of research confirms that participation in more collaborative professional communities affect teaching practices (e.g. Ertesvåg, submitted for publication; Vescio, Ross, & Adams, 2008). Given this, it is reasonable to assume that collaboration (e.g. discussions, planning, observation, useful critiques, and teach each other the practice of teaching (Little, 1982)), focusing on authoritative teaching is positively associated with authoritative teaching. Also studies on classroom management, emphasise that teachers need to be certain about their leadership qualities and strategies (e.g. McManus, 1989; Munthe, 2001). Munthe (2003a) reported a positive correlation between teacher certainty and some of the aspects of warmth and control, e.g. working actively to build relations with pupils and establishing routines for classroom activity. Therefore, it is expected that the two aspects of authoritative teaching are positively associated with aspects of teacher certainty.

53

w1

w2

w3

warmth

w4

c1 control c2 c3 c4 Fig. 2. Hypothesised model of authoritative teaching as a two-dimensional concept of warmth and control.

2. Aims This study represents one step along a path of investigating and discussing the concept of authoritative teaching through the two aspects of warmth and control. The larger study aims at improving in-service training for teachers in the field of authoritative teaching and managing pupil misbehaviour. Change initiatives grounded on sound theoretical and empirical knowledge is imperative for teachers’ professional development. This also applies to measurements to investigate teachers’ authoritative teaching. As part of the larger study, one way of measuring self-reports of authoritative teaching will be discussed. The factor structure of authoritative teaching (Fig. 2) was tested using confirmatory factor analyses (CFA) in two samples of teachers. A main aim was to develop scales that measure teachers’ development of warmth and control throughout an intervention, so CFA was applied both to data from a random sample cross sectional study (sample 1) of teachers and from a sample of teachers attending one of two interventions aimed at improving authoritative teaching. The latter, sample 2,contained data from three time points or waves, at three consecutive years. Sample 1 was included to test the two-dimensional factor in a nationwide random sample in Norway. However, the main aim of this study was to investigate both the scales constructed for this research and the scores obtained. To do this CFA was conducted based on the items and latent variables that made up authoritative teaching. Conceptually authoritative teaching was hypothesised to be two-dimensional comprising two distinct, but interrelated factors of ‘warmth’ and ‘control’. However, a rival one-factor model as well as an alternative two-level model were also tested. 3. Methods 3.1. Sample 1 3.1.1. Sample and procedure A questionnaire was completed anonymously by 870 primary and secondary school teachers in 50 Norwegian municipalities as part of the tri-annual School Environment survey by the Centre for

Behavioural Research (CBR) in 2001. Schools were selected randomly yet stratified, based on criteria developed by Statistics Norway (Statistics Norway, 1994). The sample contained 484 primary (grade 1e7), 209 secondary (grade 8e10) and 144 combined (grade 1e10) school teachers. Consent was obtained by voluntary participation based on a written description of the project according to standards prescribed by the Norwegian Data Inspectorate.

3.2. Sample 2 3.2.1. Sample and procedure Nine hundred teachers attending one of two school-wide interventions were included. Two hundred and forty three teachers at 10 schools attended the project “(Development of a) Handbook for classroom management” (Midthassel, 2006) and 657 teachers at 18 schools attended the Respect program (Ertesvåg, 2009; Ertesvåg & Vaaland, 2007), which is aimed at preventing and reducing problem behaviour. The sample contained 325 primary, 394 secondary and 157 combined school teachers. In both interventions, strengthening authoritative teaching was a main principle. Both interventions lasted two years. A questionnaire was administered at three times, before (T1), one year into (T2) and at the end of (T3) the two year interventions. One group participated in the Respect programme 2006e2008 (7 schools), two groups 2007e2009 (11 schools in the Respect programme, 10 schools in Handbook for Classroom Management.). Data collection was conducted in May each year. Although participating in different projects, all schools were part of the same data collection. Given this, the procedures for data collection were the same. Preliminary analysis of descriptive data did not reveal significant differences between the two projects for any of the items at the first wave (T1) (see Appendix for details). Therefore, the data from the two projects were pooled together. At T2, item 8 revealed a significant difference between the two samples (p ¼ .049). However, in terms of effect size (Cohen’s d) a value of .17 is lower than the .20

54

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

considered a small effect. No significant differences were found at T3. Given this there was additional support for the pooling of data. Consent was obtained by voluntary participation based on a written description of the project according to standards prescribed by the Norwegian Data Inspectorate. To safeguard anonymity, a contact person at each school administered a unique identification code for each participant generated by the deliverer of the web-based questionnaire. The contact person was responsible for keeping the auto-generated identification codes stored under lock and key for use at the later data collections. This person did not have access to the collected data. 3.3. Instruments Teachers’ authoritative teaching was measured by self report on two scales, consisting of four items each (see Table 1). The scales were developed on the basis of the theoretical framework of warmth and control outlined above. The items for each concept were developed in consultation with a panel of experts on authoritative teaching. The experts were key personnel at CBR and with more than 20 years history of research and in-service training of teachers in the field of teacherepupil relationship and authoritative teaching. These items were piloted in a national representative sample of 1153 teachers as part of the School Environment Survey in 1998. On the basis of the preliminary analysis results, some of the items were excluded and new ones were included in consultation with the expert panel Cronbach’s a was computed for each scale and these are reported in Table 1. For both samples, a questionnaire was administered to the participants. For sample 1, a paper and pencil version, for sample 2 a web-based. In both cases, the questionnaires contained scales that were suitable for validation of the warmth and control scales. Collaboration (4-item) covering categories found to be characteristic of schools were professional development take place (Little, 1982) was included for both samples (Munthe, 2003b). Measurements to identify teachers’ didactic (4-item), practical (4-item) and relational certainty (4-item) (Munthe, 2001, 2003b) were included for sample 2 only. Responses to all statements were given on a scale from 0 to 5 where 0 represents ‘not at all true’ and 5 represents ‘completely true’. Mean scores on the different scales were calculated to test validity.

3.4. Data analyses A major aim of this paper was to investigate the concept of authoritative teaching when it is understood as consisting of two distinct, but interrelated latent variables, each made up of four observed items. As the warmth and control scales were constructed according to theoretical considerations, the two-factor solution among the items was hypothesised a priori and tested using CFA. Structural equation modelling (SEM) was used to investigate the stability of the two-dimensional model. The model was fitted to the data by the means of robust maximum likelihood procedure as implemented in Mplus (Muthén & Muthén, 1998e2007). The missing data method was applied as this allowed for the use of all observations in the data set when estimating the parameters in the models. Conventional analyses were conducted using SPSS (Norusis, 2007). One-way analysis of variance (ANOVA) was applied to provide information on mean differences between groups. Spearman’s r was used for correlations because of the ordinal nature of the data. In their recommendations for goodness of fit indices Hu and Bentler (1999) suggested using a cut-off value close to .08 for the standardized root mean squared residual (SRMR) and supplementing it with indices like the Tucker-Lewis Index (TLI) and the Comparative Fit Index (CFI) with cut-off values close to .95. They also recommended including the root mean square error of approximation (RMSEA) with cut-off value of about .06 or less. The RMSEA is supported by a 90% confidence interval (90% CI). Due to the relatively large number of subjects, traditional c2 tests may provide inadequate assessment of model fit (Jöreskog, 1993). Given this, the c2 test is included, but not discussed. 3.5. Missing data There were three different types of missing data, or missingness as the phenomenon is commonly referred to (Buhi, Goodson, & Neilands, 2008), in the study. The first two types applied to both samples, the third only to sample 2. Firstly, missingness occurred as result of people not responding to the invitation to participate, usually referred to as response rate. Secondly, some teachers who participated did not, for different reasons, answer all items. There

Table 1 Descriptive statistic for items constituting warmth and control. Item

Sample 1 (N ¼ 870)

Sample 2 (N ¼ 900) Time 1

Warmth (a ¼ .82e.83) 1. I work actively to create good relationships with my pupils 2. I show interest in each pupil 3. I often praise my pupils 4. I show the pupils that I care about them (not only when it comes to academic work) Control (a ¼ .80e.83) 5. I have established routines/rules for how the pupils are supposed to act when they change activity/workplace etc. 6. I have established routines/rules for how the pupils are supposed to act in plenary teaching sessions 7. I have established routines/rules for individual work 8. I am closely monitoring the pupils behaviour in class

Time 2

M

SD

Ske

Kurt

M

SD

4.24

0.74

.95

1.48

4.46

0.62

4.36 4.22 4.11

0.65 0.77 0.69

.80 .66 .97

.99 .29 .68

4.47 4.25 4.49

3.53

0.86

.33

.17

4.10

0.77

.72

3.94

0.77

4.18

0.69

Ske

Time 3

Kurt

M

SD

Ske

.71

.47

4.42

0.66

0.94

0.63 0.71 0.67

.84 .56 1.08

.12 .33 .58

4.49 4.30 4.48

0.62 0.73 0.66

3.56

0.90

.28

.01

3.71

.72

4.18

0.78

.81

.61

.60

.62

4.00

0.67

.79

.66

.96

4.31

0.77

1.00

Kurt

M

SD

Ske

Kurt

0.89

4.45

0.64

.87

.35

1.33 0.79 1.07

04.2 0.38 0.76

4.46 4.31 4.52

0.61 0.71 0.63

.75 .63 1.06

.16 .45 .51

0.86

0.48

0.69

3.78

0.86

.42

.03

4.33

0.72

0.88

0.65

4.39

0.67

.80

.23

1.58

4.17

0.72

0.64

0.63

4.24

0.70

.66

.25

.94

4.37

0.68

0.90

1.03

4.41

0.64

.86

1.09

Note. M ¼ item means; SD ¼ standard deviation; Ske ¼ skweness; Kurt ¼ kurtosis; a ¼ Cronbach’s alpha for scales measuring warmth and control. Rating format is 0e5, where 0 ¼ never and 5 ¼ very often.

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

were several possible reasons such as skipped questions, computer malfunction etc. Thirdly, longitudinal studies usually suffer from attrition (Jeli ci c, Phelps, & Lerner, 2009), and this was also the case for this study, partly because of the study design. Teachers who left the schools dropped out of the study. New teachers starting work at the schools were included at the second and third waves, as the sample at each wave was teachers currently working at the school. The teachers’ dropout and inclusion were, most likely, not related to the programme. It seems reasonable to believe that teachers left schools for reasons other than the schools’ participation in a school-wide programme. For this reason, the missingness may be considered to be random (MAR) (Buhi et al., 2008). When data are missing at random, incomplete data arise not from the missing values themselves; rather, the missingness is a function of some other observed variable in the data. Also, both interventions contained procedures for introducing new teachers to the theoretical grounding and principles of the intervention and including them in the work at the school. For example, new teachers were invited to participate in seminars at other schools just starting the intervention and they were included in the activities among the pupils as well as in formal and informal discussions among staff. Given, this it made sense to include them in the sample although they had not been involved from the beginning. The response rate for sample 1 was 69%. The overall response rates for the three waves for sample 2 were 66%, 70% and 59% respectively. The response rate varied from 9% to 100% between schools and waves. The decrease in response rate at the third wave was mainly due to low response rates at five schools. A response rate of 9% is surely too low and excluding teachers from schools with low response rates was considered. However when consulting the external change agents assisting the schools in implementing the interventions, it became clear that these schools also were among those struggling most with implementing the programme. As the analyses were conducted at teacher level, not at school level, it was decided to keep the teachers in the sample. It is reasonable to assume that teachers at these struggling schools are less likely to be able to improve their levels of warmth and control. The measures’ ability to detect no change i.e. remains stable when no change has occurred is an important indication of the sensitivity of the scales. Both samples contained item non-response. For sample 1 and the three waves for sample 2, 95%, 94%, 93% and 95% respectively, of the respondents reported on all items. Of those who did not respond to all items, most did not respond to 1 or 2 items. Item nonresponse did not therefore seem to be a major problem in the study. Attrition was expected as the longitudinal sample included teachers working at the school at each time point. An implication is that the longitudinal sample 2 contains missingness due to nonresponsiveness, item non-response and attrition. Attrition can be separated in people withdrawing from the study for reasons other than leaving school and those missing by design for those not included in the sample at all time points, i.e. those not working at the school. At the three time points 570, 605 and 506 teachers respectively, responded to the invitation to participate. Two hundred and fifty three teachers participated at all three time points, and 540 teachers participated at least twice. Some 360 of the 900 teachers included in the sample participated only once. At the second time point, 77 teachers reported that they had worked at the school one year or less, implying that they could not have participated at the first data collection. At the third wave, the number of teachers who reported that they had been working at the school one year or less was 64. As the number of teachers in the schools was more or less consistent, most of these new teachers were replacing a teacher who had been invited to the study and thus had dropped out of the study if participating. Given that these

55

teachers replaced teachers dropping out of the study because they were leaving the school, 282 of the 360 were missing by design. The remaining 78 (8.7%) who only reported once, dropped out for other reasons including being sick at the time of data collection, parental leave, leave of absence, ignoring subsequent invitations etc. It should be noted that there is some uncertainty related to this estimate as it is unlikely that all teachers who where replaced actually participated in the programme or that all new teachers participated in the study. The lowest number of teachers participated in the study at T3. There might be several reasons for this, for example, new teachers may have thought their responses were not so important since they were new to the intervention. Demographic statistics for the group of teachers who participated in all three waves and for groups of teachers who participated in one or two are presented in Table 2. All demographic statistics were estimated as means of the individual time points since all teachers did not participate at all three time points. The results indicate that males (53%) were more likely than females (48%) to participate at all three time points. Also, the younger the teacher, the more likely he/she did not participate in all three waves. The exception was teachers older than 61 years. This is not unexpected as teachers in this age group tend to retire. Younger teachers may be more likely than older teachers to have parental leave which in Norway is 46e56 weeks. Also, due to lack of experience, they may be less likely to get a permanent job and may have to leave the school because their contract is ended. The same tendency was found regarding seniority defined as number of years’ experience as a teacher. Also, the mean number of years working at the school was significantly lower for the teachers who participated once or twice (M ¼ 7.7, SD ¼ 7.67) than for the teachers who participated in all three waves (M ¼ 11.9, SD ¼ 8.33). The difference was significant F(1,873) ¼ 50.68, p ¼ .000. 4. Results 4.1. Descriptive statistics The means, standard deviations, skewness and kurtosis of the distribution of teachers self-reports in both samples and all waves for sample 2 are presented in Table 1. The table contains information for the eight items for both samples and all waves. The data

Table 2 Analyses of attrition. Demographic statistics for teachers taking part at all waves and teachers taking part at one or two waves. Participants taking part at all waves (N ¼ 253)

Participants taking part at one or two waves (N ¼ 540)

Gender Female (N ¼ 638) Male (N ¼ 245)

48% 53%

52% 47%

Age 25 year or younger (N ¼ 32) 26e30 years (N ¼ 154) 31e40 years (N ¼ 298) 41e50 years (N ¼ 168) 51e60 years (N ¼ 191) 61 years or older (N ¼ 47)

3% 16% 23% 36% 44% 30%

97% 84% 77% 64% 56% 70%

Seniority Less than 5 years (N ¼ 220) 5e10 years (N ¼ 207) 11e20 years (N ¼ 222) More than 20 years (N ¼ 235)

14% 21% 37% 29%

86% 79% 64% 72%

Mean years at the school

11.9 (SD ¼ 8.33)***

7.7 (SD ¼ 7.67)

***p < .001.

56

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

showed a normal univariate distribution, given that most skewness and kurtosis values fell within the range of 1.0 to þ1.0 for both samples (Muthén & Kaplan, 1985, 1992) (see Table 1). Muthén and Kaplan (1992) and Curran, West, and Finch (1996) suggested there might be significant problems in estimation when univariate standardized skewness has an absolute value of 2.0 or larger, when standardized kurtosis has an absolute value of 7.0 or larger, or when both are true. Skewness and kurtosis for all items in both samples were well within these values. 4.2. CFA models Analyses were conducted in two blocks. In the first block, the measurement structure of the four items referring to warmth and the four items referring to control was examined. Specifically: a) whether the hypothesised two-factor model could be found in cross sectional data, and b) if found, whether the two-factor model measurement factor (i.e. the specific factor loadings) would be invariant across time was tested. The latter to ensure that the same concept is measured at different time points. A practical fit index to test invariance of the measurement structure was employed. The fit indices indicate how well the hypothesised theoretical model, in this case authoritative teaching as a two-dimensional concept of warmth and control represented by four items each (see Fig. 2), is supported by the data from the longitudinal sample. Little (1997) suggested model invariance can be assumed a) if the overall model fit is acceptable, as indicated by fit indices b) if the difference in fit is negligible (e.g. .05 for the CFI, TLI, or similar indices) after introduction of the equality constraints; and c) if the justification for the accepted model is substantially more meaningful and the interpretation more parsimonious than the alternative model. In addition, the recommendations by MacCallum, Browne, and Sugawara (1996) were followed. The 90% confidence interval (CI) around the RMSEA was employed to evaluate model fit and for nested model comparisons. Specifically, if the upper bound of the CI is equal to or lower than .05, a close fit of the model to the data can be assumed. Moreover, if the CIs of subsequent nested models overlap with those of preceding, less constrained models, the more parsimonious model is deemed acceptable. 4.2.1. Testing the measurement structure In the first step of analyses, using CFA, whether the hypothesised two-factor model (Fig. 2) could be found in the data was tested. To do this, an initial model where the four warmth items loaded on the one factor and four control items loaded on the other was specified. The variances of the two latent factors were fixed to 1.0 so all factor loadings could be freely estimated. Also, the two latent factors were free to correlate. The initial model was tested both in the representative nationwide cross sectional (sample 1) and the first wave for the longitudinal sample (sample 2). Fit indices for the different factor models tested, for both samples and for the different time points in sample 2 are presented in Table 3. 4.2.1.1. Cross sectional samples. First, the hypothesised two-factor model was tested. Also, an alternative one-factor model where all eight items loaded on this factor was tested to rule out the possibility that this model fitted the data better than the hypothesised two-factor model. The initial two-factor model showed acceptable fit to the data both for sample 1 (c2(19) ¼ 39.43, p ¼ .004; CFI ¼ .99; TLI ¼ .98; SRMR ¼ .024; RMSEA ¼ .036; 90% CI ¼ .020e.051) and sample 2 (c2(19) ¼ 29.39, p ¼ .060; CFI ¼ .99; TLI ¼ .99; SRMR ¼ .027; RMSEA ¼ .031; 90% CI ¼ .000e.052), suggesting that the hypothesised two-factor structure of warmth and control was supported by the data. To test further the hypothesis of a two-factor

Table 3 Fit indices for models of the warmth and control.

c2 Study 1 Two-factor model One-factor model Study 2 Time 1 Two-factor model One-factor model Time 2 Two-factor model Time 3 Two-factor model Warmth (T1eT3) Control (T1eT3) Combined model (T1eT3) Freely estimated Factor loadings equal over time Stability model (T1eT3)

df

CFI

TLI

SRMR

RMSEA (90% CI)

39.43 352.88

19 20

.99 .88

.98 .83

.024 .061

.036 (.020e.051) .114 (.101e.127)

29.39 192.79

19 20

.99 .87

.99 .81

.027 .062

.031 (.000e.052) .123 (.108e.123)

54.89

19

.97

.96

.035

.056 (.039e.073)

46.78 57.28 129.58

19 53 53

.98 1.00 .96

.97 1.00 .95

.031 .056 .077

.054 (.034e.073) .010 (.000e.024) 040 (.032e.049)

536.05 440.93

237 245

.94 .96

.93 .96

.049 .075

.038 (.033e.042) .030 (.025e.034)

495.62

247

.94

.95

.147

.034 (.029e.038)

Note. c2 ¼ Chi square, df ¼ degrees of freedom; CFI ¼ Comparative Fit Index; TLI ¼ Tucker-Lewis Index; SRMR ¼ standardized root mean squared residual; and RMSEA ¼ root mean squared error of approximation supported by 90% confidence interval (90% CI).

structure in the data, a rival one-factor model was specified, partly because of a high factor correlation (r ¼ .72 on both samples) between warmth and control which can indicate a one-factor structure. The goodness of fit indices for the one-factor solution did not meet the suggested criteria for either of the samples (sample 1: c2(20) ¼ 352.88, p ¼ .004; CFI ¼ .88; TLI .83; SRMR ¼ .061; RMSEA ¼ .114; 90% CI ¼ .101e.127, sample 2: c2(20) ¼ 192.79, p ¼ .000; CFI ¼ .87; TLI .81; SRMR ¼ .062; RMSEA ¼ .123; 90% CI ¼ .108e.123). Given that the respective CIs associated with the RMSEA for the one- and two-factor models did not overlap either (MacCallum, Browne, & Cai, 2006; MacCallum et al., 1996), this finding indicated that the two-factor structure was indeed a better representation of the data. Clustered sampling, i.e. teachers within schools, enabled the testing of a two-level model of factor analysis. The results for sample 1 and sample 2 were contradictory. Design effects larger than 2, a rule of thumb value for indicating the relevance of a twolevel model, on all items and intra class correlation (ICC) exceeding .10 for all but one item for sample 1 indicated a two-level structure (Muthén, 1997). However, for sample 2 only 4 out of 8 items revealed design effect larger than 2. Of these, 3 barely exceeded 2 (2.01, 2.05, 2.08). Together with a low ICC (.02e.12) only exceeding .10 for 1 item, a two-level model was not justified. For further investigation a two-level model for waves 2 and 3 for sample 2 was also tested. Although there were indications of more variance at school level for waves 2 and 3, design effects did not exceed 2 for all items (7 of 8, waves 2 and 4 of 8, wave 3). Also, even though higher than for wave 1, ICC only exceeded .10 for one item at each wave. Given this, there was no strong support for a two-level model in sample 2. Based on the fact that the representative sample 1 indicated a two-level model it could be argued that a two-level approach was appropriate. However, given that invariance of the measure as tested in sample 2 did not indicate a two-level model, a one level model was chosen. The indices in the different estimated two-factor models (see Table 3) revealed an acceptable fit for the observed data (c2(19) ¼ 29.39e54.89, p ¼ .000e.060; CFI ¼ .97e.99; TLI ¼ .96e.99; SRMR ¼ .024e.035; RMSEA ¼ .031; 90% CI ¼ .000e.073). The model’s standardized solutions and two-factor correlations for

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

0.26***/0.55***

w1

0.17***/0.36***

w2

0.27***/0.50***

w3

57

0.72***/0.67***

0.78***/0.80***

0.70***/0.70***

Warmth

0.77***/0.81*** 0.19***/0.35***

w4 0.72***/0.72***

0.42***/0.60***

c1

0.66***/0.63***

Control 0.80***/0.83***

0.21***/0.31***

c2 0.78***/0.78***

0.23***/0.39***

c3

0.28***/0.52***

c4

0.64***/0.70***

Goodness of Fit: :χ2(19/19) = 39.43/29.39; CFI = .99/.99; TLI = .98/.99; SRMR = .024/.027 RMSEA = .036/.031, 90% CI = (.020/.000-.051/.052) *** p < .001. Fig. 3. Measurement model and factor correlations (standardized metric) of authoritative teaching understood as consisting of the two factors warmth and control. The first parameter estimate and fit indices refer to sample 1 and the second to sample 2 time 1.

sample 1 and sample 2 at time point 1 are presented in Fig. 3. In the figure, lines directed from a latent factor (e.g. warmth) to a particular observed variable (e.g. item 1) denote the relationship between that factor and that item. These relationships are interpreted as factor loadings. The model solutions and factor correlations for T2 and T3 in study 2 did not differ substantially from T1 and are therefore not presented. Residual variance was checked for all models tested, none of them were negative and therefore not further elaborated on. Because some of the factor loadings of items in the same factor differed by as much as .20, possible cross-loadings of these items on the other factor were examined following a procedure of inspecting the standardized expected parameter change (SEPC) in the Mplus output (Kaplan, 1989). A cross-loading indicates that an item loads not only on the factor it is hypothesised to load on, but also the other factor, implying that the theoretical model does not fit perfectly to the data. The cross-loadings ranged between .22 and .25 across the two samples. Most of these values were too small to indicate the presence of a cross-loading of any of the items

on the factor they were not hypothesised to be part of. The highest cross-loading was found for item 8 (see Table 1) for sample 2. However, the loading on the control factor (.70) was more than twice as large as the loading on the warmth factor (.25), which is a commonly suggested strict rule to reject the existence of a crossloading (Hinkin, 1998). Item 8 is about monitoring the pupils’ behaviour in class. Except that there is generally a relation between warmth and control as two aspects of authoritative teaching, the item is clearly related to the concept of control. Given this, there is little theoretical support for the cross-loadings. Also, when allowing the item to load on both factors, it resulted in poorer model fit for both samples and all waves. Performing the same post-hoc analysis for the second and third wave of sample 2 gave additional support to the idea that there were no such crossloadings as they clearly met the criteria of ‘twice as high loadings’ with loadings of .29 and .24 on warmth and .67 and .72 on control. Therefore, there was neither theoretical nor empirical support for these cross-loadings. All the results presented so far supported a two-factor model.

58

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

4.2.1.2. Longitudinal sample. Next, whether the two-factor model was invariant over time was tested. To do this, a model was specified where, at each of the three time points, the four warmth items loaded on one factor and the four control items loaded on another. Residual variances of the same indicator were allowed to correlate across the three time points. In addition, the latent factors were free to correlate at the same time points. Although perhaps not strictly necessary as warmth and control were investigated as part of a two-dimensional concept, each scale was examined for factor invariance over time as part of a step by step investigation. Confirmatory factor analyses were performed to investigate the fit indices for the warmth and control factors across time separately. Fit indices for three time points provided good fit for warmth (c2(53) ¼ 57.28, p ¼ .319; CFI ¼ 1.00; TLI .1.00; SRMR ¼ .056; RMSEA ¼ .010; 90% CI ¼ .000e.024) and control (c2(53) ¼ 129.58, p ¼ .000; CFI ¼ .96; TLI .95; SRMR ¼ .077; RMSEA ¼ .040; 90% CI ¼ .032e.049). Given this a combined two-dimensional model was tested. All factor loadings, residual variances, and correlations among the latent factors were freely estimated because no equality constraints were imposed in this initial model across the three time points. This first step establishes a baseline against which the imposition of further constraints, indicated by the hypothesised model, could then be tested (e.g. Little, 1997). Although it did not quite meet the criteria of Hu and Bentler (1999) for all fit indices, this initial model showed acceptable fit to the data (c2(237) ¼ 536.05, p ¼ .000; CFI ¼ .94; TLI .93; SRMR ¼ .049; RMSEA ¼ .038; 90% CI ¼ .033e.042), suggesting that a two-factor structure of warmth and control represented the relationship between the observed authoritative teaching items. To further test the existence of a two-factor longitudinal model, factor invariance was tested by holding factor loadings equal over time. The model provided good fit according to the recommendations of Hu and Bentler (1999) (c2(245) ¼ 440.93, p ¼ .000; CFI ¼ .96; TLI .96; SRMR ¼ .075; RMSEA ¼ .030; 90% CI ¼ .025e.034). In general, to secure factorial invariance over time, factor loadings and item intercepts are constrained to remain equal over time. Item intercepts reflect mean values on specific items, and in order to secure complete invariant measurement of concepts across time, item intercepts are constrained to remain the same across time points. Given that the difference between the unconstrained and the constrained model was negligible and the final model gave the most meaningful solution, the criteria for measurement invariance suggested by Little (1997) outlined above, were met. 4.2.2. Testing the measurement stability The combined model CFA, including warmth and control at all three waves, resulted in an acceptable fit. When introducing the stability coefficients from one wave to another as portrayed in Fig. 4,

the model was slightly worsened (c2(247) ¼ 495.62, p ¼ .000; CFI ¼ .95; TLI .94; SRMR ¼ .147; RMSEA ¼ .034; 90% CI ¼ .029e.038) as the SRMR was now higher than the recommended .080. However, it was considered acceptable as the other fit indices were within or close to the recommended values. The results for this model indicated high stability from one wave to the next for both warmth (b ¼ .73, p ¼ .000 and b ¼ .46, p ¼ .000) and control, (b ¼ .57, p ¼ .000 and b ¼ .58, p ¼ .000). The coefficient remained significant over two years for warmth (b ¼ .36, p ¼ .008). The coefficient for control, however, did not reveal a significant stability on the 5% level, although it did on the 10% level (b ¼ .13, p ¼ .073). Significant correlations between warmth and control were revealed at each of the three waves (r ¼ .67, p ¼ .000, r ¼ .81, p ¼ .000 and r ¼ .84, p ¼ .000). Based on these results the stability of the model is considered acceptable. 5. External validity As part of the validation, scores on the scales were related to other aspects of teaching. From the results presented in Table 4, it is evident that, as expected, the warmth and control scales both correlated significantly with cooperation with colleagues for both studies and with didactic, practical and relational certainty that were only investigated in sample 2. 6. Discussion The first aim of this study was to investigate the concept of authoritative teaching through self-reports of the two aspects of warmth and control. Training on authoritative teaching has potential in helping teachers develop effective teaching and classroom management practices. Measures enabling the investigation of changes in level of authoritative teaching are imperative for reliable and valid evaluation of the success of professional development in this field. The results provide support for warmth and control as two dimensions of authoritative teaching. Using CFA, a two-factor measurement structure was found both in two cross sectional samples and a longitudinal sample. Although there is quite some overlap between the two concepts, they also measure distinct features. These results support the hypothesis of a two-dimensional concept of authoritative teaching. This structure showed a better approximation with both samples than the rival one-factor model tested in the study. The two factors also turned out to be highly reliable according to Cronbach’s alpha coefficients. The strengths of the study lie in the statistical methodology used (CFA, SEM), the longitudinal data design, and the two sample procedure used for the cross sectional data, allowing replication of the results in a representative sample. However, the study has

0.26**

0.46***

0.73***

warmth1

warmth2 - 0.072

0.013 0.67***

warmth3

0.81*** - 0.063

0.84*** -0.023 0.58***

0.57*** control1

control2

control3

0.13

Goodness of Fit:χ2= 495.62 df = 247; CFI = .94; TLI = .95; SRMR = .147 RMSEA = .034, 90% CI = (.029-.038) ** p< .01 *** p = .001 Fig. 4. Stability model of the two-factor structure of authoritative teaching.

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

59

Table 4 Correlations (Spearman’s r) between two aspect of authoritative teaching, and collaboration and three aspects of teacher certainty. Sample 1 (N ¼ 870)

Sample 2 (N ¼ 900) Time 1

Collaboration Teacher certaintya Didactic Practical Relational

Time 2

Time 3

Warmth

Control

Warmth

Control

Warmth

Control

Warmth

Control

.17***

.23***

.37***

.38***

.35***

.35***

.33***

.31***

.27*** .32*** .42***

.38*** .43*** .47***

.24*** .22*** .35***

.36*** .32*** .33***

.28*** .28*** .32***

.25*** .27*** .32***

***p < .001. a Teacher certainty was measured among sample 2 teachers only.

certain limitations that should be addressed. First, it cannot be ruled out that the response rate of 69% in sample 1 has affected the degree to which the sample is representative. Second, it should be borne in mind that, in the light of the fit indices, the stability model was not faultless. Although the longitudinal confirmatory factor analysis provided good fit according to the recommendations of Hu and Bentler (1999) and indicated time invariance, the SRMR and TLI value of the stability model did not quite meet the cut-off criteria. However, because the TLI value of .94 and the other fit indices met the cut-off criteria, it was considered acceptable. The measurement would benefit particularly from an investigation of whether it is invariant over time in a longitudinal sample not participating in an intervention. If the model is not time invariant in such a sample this may be problematic as the model would then not be invariant across groups with and without intervention. This would limit the applicability of the measurement to different types of studies. The inconsistent results between samples 1 and 2 regarding the variance distribution on school and teacher level call for a closer examination. A possible reason that the longitudinal sample did not indicate a two-level model may be that schools taking on an intervention aiming at improving authoritative teaching have a special interest in the field (Midthassel & Ertesvåg, 2008). However, future studies will benefit from a closer examination of this issue. Also, investigating the sensitivity of the measurement to thoroughly discuss the capacity of the scales to detect meaningful change over time may be interesting. This involves two issues: first, the measure must detect meaningful change when it has occurred, and second, it must remain stable when no change has occurred. However, the longitudinal model applied to an intervention sample did meet the criteria for assuming model invariance outlined by Little (1997) referred to above, and this strengthens the study. A note on missingness needs to be added. Although it does not meet the text book suggestions, the response rate for the nationally representative sample of 69% was acceptable considering the response rate of earlier studies of teachers e.g. by Munthe (2001) 61% or Midthassel, Bru, and Idsøe (2000) 70%. The response rates of the intervention sample were 66%, 70% and 59% respectively for the three waves. At sample level, the response rates, especially in sample 2, 3rd wave, were low. It is not uncommon for there to be lower than desirable response rates in longitudinal designs (e.g. Idsoe, Hagtvet, Bru, Midthassel, & Knardahl, 2008; Mäkikangas et al., 2006). Although some sample attrition was found, an estimated attrition not due to the design of less than 10% is considered to be acceptable and less than for many longitudinal studies (Jeli ci c et al., 2009). However, the possibility that the samples were somehow selected cannot be ruled out. The dropouts in the longitudinal sample did not appear to be caused by the interventions. Moreover, sample 2 was not a random sample. Given this, the

results cannot be generalized, for example, to samples of teachers participating in any intervention. The application of maximum likelihood method in the present study allowed for the use of all observations in the data set when estimating the parameter models. A major advantage of including all data is improvement of the statistical power. Traditionally, missing data have been handled using techniques that are statistically problematic, such as, listwise deletion, pairwise deletion or mean substitution techniques (Buhi et al., 2008; Jeli ci c et al., 2009). The possibility of drawing on all available data was a strength to the study. In general the longitudinal design supported a model that was time invariant, that is: the same concepts were measured at all three waves. The findings of a stable measurement structure of warmth and control over time were important because without such invariance a meaningful examination of change in mean level of the two aspects of authoritative teaching would not be possible. The stability of the measurement structure suggests that teachers’ specific expressions of warmth (e.g. show interest in each pupil, praise) and control (e.g. establish routines, monitoring) seem to be used, albeit not necessarily to the same extent, over time. As noted earlier, you may be in danger of measuring other concept than warmth or control at the different time points instead of measuring change in the two dimensions of authoritative teaching. In professional development it may be imperative to investigate whether an intervention has the expected outcome. A call for more evidence based professional development also calls for measurements that have a sound theoretical and empirical grounding to evaluate them. Without such measures, you may be in danger of implementing change efforts without any knowledge of their effect. For longitudinal studies, the stability of the measurements over time secures the ability to investigate authoritative teaching, and not something else, over time. Along with measurements of teachers’ self-reports, studies of authoritative teaching would benefit from other information like pupil reports, and/ or observations. Although it has been concluded that pupil ratings can be used in assessing the quality of teaching, there is controversy about whether pupils should be used to rate teachers’ behaviour (e.g. den Brok, Brekelmans, & Wubbels, 2006; Jong & Westerhof, 2001; Kunter & Baumert, 2006). Therefore, teacher self-reports may provide additional information when studying teachers teaching styles. Moreover, information from different groups of respondents will improve the richness of information, and teacher self-reports may provide important insight into teachers’ perception of their level of authoritative teaching. This may be information of great interest in teachers’ professional development.

Acknowledgements The author is grateful to Trude Havik for her thorough work in administering the data collection and in preparing the data for analysis.

60

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61

Appendix

Table Descriptive statistic for items constituting warmth and control for the two different interventions groups in sample 2. Item

Time 1

Time 2

Gr 1

Warmth 1. I work actively to create good relationships with my pupils 2. I show interest in each pupil 3. I often praise my pupils 4. I show the pupils that I care about them (not only when it comes to academic work) Control 5. I have established routines/rules for how the pupils are supposed to act when they change activity/workplace etc. 6. I have established routines/rules for how the pupils are supposed to act in plenary teaching sessions 7. I have established routines/rules for individual work 8. I am closely monitoring the pupils behaviour in class

Gr 2

Time 2

Gr 1

Gr 2

Gr 1

Gr 2

M

SD

M

SD

M

SD

M

SD

M

SD

M

SD

4.44

0.63

4.51

0.60

4.45

0.65

4.38

0.65

4.45

0.62

4.45

0.68

4.45 4.24 4.46

0.63 0.71 0.68

4.53 4.29 4.56

0.61 0.73 0.62

4.47 4.31 4.51

0.64 0.72 0.64

4.54 4.25 4.40.

0.59 0.74 0.70

4.47 4.31 4.53

0.61 0.71 0.63

4.44 4.43 4.51

0.60 0.72 0.64

3.52

0.89

3.66

0.93

3.70

0.84

3.71

0.93

3.75

0.86

4.85

0.87

4.14

0.79

4. 28

.78

4.39

0.71

4.28

0.74

4.40

0.65

4.35

0.71

3.98 4.31

0.75 0.70

4.05 4.31

.84 .72

4.19 4.40

0.71 0.65

4.11 4.28*

0.76 0.74

4.24 4.40

0.69 0.63

4.25 4.44

0.74 0.66

Note. M ¼ item means; SD ¼ standard deviation. Rating format is 0e5, where 0 ¼ never and 5 ¼ very often. Gr 1 ¼ Respect; Gr 2 ¼ Handbook for classroom management. *p < .05.

References Baker, J. A., Clark, T. P., Crowl, A., & Carlson, J. S. (2009). The influence of authoritative teaching on children’s school adjustment: are children with behavioural problems differentially affected? School Psychology International, 30(4), 374e382. Baumrind, D. (1967). Child care practices anteceding three patterns of preschool behavior. Genetic Psychology Monographs, 75, 43e88. Baumrind, D. (1991). Parenting styles and adolescent development. In J. BrooksGunn, R. Lerner, & A. C. Peterson (Eds.), The encyclopedia of adolescence (pp. 746e758). New York: Garland. Brekelmans, W., Wubbels, T., & van Tartwijk, J. (2005). Teacherestudent relationships across the teacher career. International Journal of Educational Research, 43, 55e71. Brophy, J. E. (1996). Teaching problem students. New York: Guildford. Bru, E., Stephens, P., & Torsheim, T. (2002). Students’ perception of class management and reports of their own misbehaviour. Journal of School Psychology, 40(4), 287e307. Buhi, E. R., Goodson, P., & Neilands, T. B. (2008). Out of sight, not out of mind: strategies for handling missing data. American Journal of Health Behaviour, 32(1), 83e92. Connor, C. M., Son, S.-H., Hindman, A. H., & Morrison, F. J. (2005). Teacher qualifications, classroom practices, family characteristics, and preschool experience: complex effects on first graders’ vocabulary and early reading outcomes. Journal of School Psychology, 43(4), 343e375. Cornelius-White, J. (2007). Learner-centered teacherestudent relationships are effective: a meta-analysis. Review of Educational Research, 77(1), 113e143. Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis. Psychological Methods, 1(1), 16e29. den Brok, P., Brekelmans, M., & Wubbels, T. (2006). Multilevel issues in research using students’ perceptions of learning environments: the case of the questionnaire on teacher interaction. Learning Environments Research, 9(3), 199e213. Doyle, W. (1986). Classroom organization and management. In M. C. Wittrock (Ed.), Handbook of research in teaching (3rd ed.). (pp. 392e421) New York: Macmillan Publishing Company. Ertesvåg, S. K. (2009). Classroom leadership e the effect of a school development programme. Educational Psychology, 29(5), 515e539. Ertesvåg, S. K. Improving teacher collaboration through school wide interventions, submitted for publication. Ertesvåg, S. K., & Vaaland, G. S. (2007). Prevention and reduction of behavioural problems in school: an evaluation of the respect-program. Educational Psychology, 27(6), 713e736. Evertson, C. M., & Weinstein, C. S. (2006). Classroom management as a field of inquiry. In C. M. Evertson, & C. S. Weinstein (Eds.), Handbook of classroom management. Research, practice and contemporary issues (pp. 3e15). Lawrence Erlbaum Associates. Goddard, Y. L., Goddard, R. D., & Tschannen-Moran, M. (2007). Theoretical and empirical investigation of teacher collaboration for school improvement and student achievement in public elementary schools. Teachers College Record, 109 (4), 877e896.

Good, T. L., & Brophy, J. (2007). Looking in classrooms (10th ed.). Boston: Allen and Bacon. Graham, P. (2007). Improving teacher effectiveness through structured collaboration: a case study of a professional learning community. RMLE Online: Research in Middle Level Education, 31(1), 1e17. Hamre, B. K., & Pianta, R. C. (2005). Can instructional and emotional support in the first grade classroom make a difference for children at risk of school failure? Child Development, 76(5), 949e967. Hinkin, T. R. (1998). A brief tutorial on the development of measures for use in survey questionnaires. Organisational Research Methods, 1(1), 104e121. Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1e55. Hughes, J. N. (2002). Authoritative teaching: tipping the balance in favor of school versus peer effects. Journal of School Psychology, 40(6), 485e492. Hughes, J. N., & Kwok, O. (2006). Classroom engagement mediates the effect of teacherestudent support on elementary students’ peer acceptance: a prospective analysis. Journal of School Psychology, 43(6), 465e480. Hui, E. K. P., & Sun, R. C. F. (2010). Chinese children’s perceived school satisfaction: the role of contextual and intrapersonal factors. Educational Psychology, 30(2), 155e172. Idsoe, T., Hagtvet, K. A., Bru, E., Midthassel, U. V., & Knardahl, S. (2008). Antecedents and outcomes of intervention program participation and task priority change among school psychology counselors: a latent variable growth framework. Journal of School Psychology, 46(1), 23e52. Jeli ci c, H., Phelps, E., & Lerner, R. M. (2009). Use of missing data methods in longitudinal studies: the persistence of bad practices in developmental psychology. Developmental Psychology, 45(4), 1195e1199. Jong, R.d., & Westerhof, K. J. (2001). The quality of student ratings of teacher behaviour. Learning Environments Research, 4(1), 51e85. Jöreskog, K. G. (1993). Testing structural equation models. In K. A. Bollen, & J. S. Long (Eds.), Testing structural equation modeling (pp. 294e316). Newbury Park: Sage. Kaplan, D. (1989). Model modification in covariance structure analysis: application of the expected parameter change statistic. Multivariate Behavioral Research, 24 (3), 285e305. Kim, Y. H., Stormont, M., & Espinosa, L. (2009). Contributing factors to South Korean early childhood educators’ strategies for addressing children’s challenging behaviors. Journal of Early Intervention, 31(3), 227e249. Kleinfeld, J. (1975). Effective teachers of Eskimo and Indian students. School Review. Kounin, J. S. (1970). Discipline and group management in classrooms. New York: Holt, Rinehart & Winston. Kunter, M., & Baumert, J. (2006). Who is the expert? Construct and criteria validity of students and teacher ratings of instruction. Learning Environments Research, 9 (3), 231e251. Kuntsche, E., Gmel, G., & Rehm, J. (2006). The Swiss teaching style questionnaire (STSQ) and adolescent problem behaviors. Swiss Journal of Psychology, 65(3), 147e155. Lewis, R. (2006). Classroom discipline in Australia. In C. M. Evertson, & C. S. Weinstein (Eds.), Handbook of classroom management. Research, practice and contemporary issues (pp. 1193e1214). Lawrence Erlbaum Associates.

S.K. Ertesvåg / Teaching and Teacher Education 27 (2011) 51e61 Little, J. W. (1982). Norms of collegiality and experimentation: workplace conditions of school success. American Educational Research Journal, 19(3), 325e346. Little, T. D. (1997). Mean and covariance structures (MACS) analyses of crosscultural data: practical and theoretical issues. Multivariate Behavioral Research, 32(1), 53e76. Loizou, E. (2009). In-service early childhood teachers reflect on their teacher training program: reconceptualizing the case of Cyprus. Journal of Early Childhood Teacher Education, 30(3), 195e209. MacCallum, R. C., Browne, M. W., & Cai, L. (2006). Testing differences between nested covariance structure models: power analysis and null hypotheses. Psychological Methods, 11(1), 19e35. MacCallum, R. C., Browne, M. W., & Sugawara, H. M. (1996). Power analysis and determination of sample size for covariance structure modeling. Psychological Methods, 1(2), 130e149. Maccoby, E. E. (1992). The role of parents in the socialization of children: an historical overview. Developmental Psychology, 28(6), 1006e1017. Mäkikangas, A., Feldt, T., Kinnunen, U., Tolvanen, A., Kinnunen, M.-L., & Pulkkinen, L. (2006). The factor structure and factorial invariance of the 12-item general health questionnaire (GHQ-12) across time: evidence from two communitybased samples. Psychological Assessment, 18(4), 444e451. Manning, M. L., & Bucher, K. T. (2003). Classroom management: Models, applications and cases. Upper Saddle River, NJ: Prentice Hall. Marchant, G. J., Paulson, S. E., & Rothlisberg, B. A. (2001). Relations of middle school students’ perceptions of family and school contexts with academic achievement. Psychology in the Schools, 38(6), 505e519. McManus. (1989). Troublesome behaviour in the classroom. London: Routledge. Meirink, J. A., Meijer, P. C., & Verloop, N. (2007). A closer look at teachers’ individual learning in collaborative settings. Teachers and Teaching: Theory and Practice, 13 (2), 145e164. Midthassel, U. V. (2006). Creating a shared understanding of classroom management. Educational Management Administration & Leadership, 34(3), 365e383. Midthassel, U. V., Bru, E., & Idsøe, T. (2000). The principal’s role in promoting school development activity in Norwegian compulsory schools. School Leadership & Management, 20(2), 147e160. Midthassel, U. V., & Ertesvåg, S. K. (2008). Schools implementing zero e the process of implementing an anti-bullying programme in six Norwegian compulsory schools. Journal of Education Change, 9(2), 153e172. Moos, R. (1978). A typology of junior high and high school classrooms. American Educational Research Journal, 15(1), 53e66. Morrison, T. L. (1974). Control as an aspect of group leadership in classrooms: a review of research. Journal of Education, Boston, 156(4), 38e64. Munthe, E. (2001). Professional uncertainty/certainty: how (uncertain) are teachers, what are they (un)certain about and how is (un)certainty related to age, experience gender, qualifications and school type? European Journal of Teacher Education, 24(3), 355e368. Munthe, E. (2003a). Teachers’ professional certainty. A survey of Norwegian teachers’ perception of professional certainty in relation to demographic, workplace, and classroom variables. Doctoral thesis. Faculty of Education, University of Oslo.

61

Munthe, E. (2003b). Teachers’ workplace and professional certainty. Teaching and Teacher Education, 19(8), 801e813. Muthén, B. (1997). Latent variable modelling with longitudinal and multilevel data. In A. Raftery (Ed.), Sociological methodology (pp. 453e480). Boston: Blackwell Publishers. Muthén, B., & Kaplan, D. (1985). A comparison of some methodologies for the factor analysis of non-normal Likert variables. British Journal of Mathematical and Statistical Psychology, 38(2), 171e189. Muthén, B., & Kaplan, D. (1992). A comparison of some methodologies for the factor analysis of non-normal Likert variables: a note on the size of the model. British Journal of Mathematical and Statistical Psychology, 45(1), 19e30. Muthén, L. K., & Muthén, B. O. (1998e2007). Mplus user’s guide (5th ed.). Los Angeles, CA: Muthén & Muthén. Nordahl, T., Gravrok, Ø., Knutsmoen, H., Larsen, T. M. B., & Rørnes, K. (Eds.). (2006). Forebyggende innsatser i skolen [Preventive actions in school]. Oslo: Sosial og helsedirektorartet/Utdanningsdirektoratet. Norusis, M. J. (2007). SPSS 15.0 guide to data analysis. Chicago, IL: Prentice Hall, Inc. Olweus, D. (2004). The Olweus bullying prevention programme: design and implementation issues and a new national initiative in Norway. In P. K. Smith, D. Pepler, & K. Rigby (Eds.), Bulling in school (pp. 13e36). Cambridge: Cambridge University Press. Patrick, H., Turner, J. C., Meyer, D. K., & Midgley, C. (2005). How teachers establish psychological environments during the first days of school: associations with avoidance in mathematics. Teachers College Record, 105(8), 1521e1558. Pellerin, L. A. (2005). Applying Baumrind’s parenting typology to high schools: towards a middle-range theory of authoritative socialization. Social Science Research, 34(2), 283e303. Pianta, R. C., Belsky, J., Vandergrift, N., Houts, R., & Morrison, F. J. (2008). Classroom effects on children’s achievement trajectories in elementary school. American Educational Research Journal, 45(2), 365e397. Roeser, R. W., Midgley, C., & Urdan, T. C. (1996). Perceptions of the school psychological environment and early adolescents’ psychological and behavioral functioning in school: the mediating role of goals and belonging. Journal of Educational Psychology, 88(3), 408e422. Roland, E., & Galloway, D. (2002). Classroom influences on bullying. Educational Research, 44(3), 299e312. Sokal, L., Smith, D. G., & Mowat, H. (2003). Alternative certification teachers’ attitudes toward classroom management. The High School Journal, 86(3), 8e16. Statistics Norway. (1994). Standard classification of municipalities 1994. Oslo: Statistics Norway. van Tartwijk, J., den Brok, P., Veldman, I., & Wubbels, T. (2009). Teachers’ practical knowledge about classroom management in multicultural classrooms. Teaching and Teacher Education, 25(3), 453e460. Vescio, V., Ross, D., & Adams, A. (2008). A review of research on the impact of professional learning communities on teaching practice and student learning. Teaching and Teacher Education, 24(1), 80e91. Walker, J. M. (2008). Looking at teacher practices through the lens of parenting style. Journal of Experimental Education, 76(2), 218e240. Wentzel, K. R. (2002). Are effective teachers like good parents? Teaching styles and student adjustment in early adolescence. Child Development, 73(1), 287e301.