Predictors of developmentally appropriate classroom practices in kindergarten through third grade

Predictors of developmentally appropriate classroom practices in kindergarten through third grade

Early Childhood Research Quarterly 16 (2001) 431– 452 Predictors of developmentally appropriate classroom practices in kindergarten through third gra...

105KB Sizes 0 Downloads 35 Views

Early Childhood Research Quarterly 16 (2001) 431– 452

Predictors of developmentally appropriate classroom practices in kindergarten through third grade夞 Kelly L. Maxwella,*, R. A. McWilliama, Mary Louise Hemmeterb, Melinda Jones Aultb, John W. Schusterb a

University of North Carolina-Chapel Hill, Frank Porter Graham Child Development Center, CB #8180, Chapel Hill, NC 27599-8180, USA b University of Kentucky, Department of Special Education, Kentucky, USA

Abstract This study was designed to (a) test the psychometric properties of a new observation measure of developmentally appropriate classroom practices in kindergarten through third-grade classrooms, and (b) determine how well classroom and teacher characteristics predict developmentally appropriate classroom practices. Teacher-reported and observational data from 69 classrooms provided support for construct validity, internal consistency, and interrater agreement of the Assessment of Practices in Early Elementary Classrooms (APEEC) measure. Hierarchical multiple regression analyses indicated that classroom characteristics (grade, class size, number of children with disabilities), teacher characteristics (education level, years of experience) and teacher beliefs (developmentally appropriate beliefs and developmentally inappropriate beliefs) accounted for 42% of the variance in observed classroom practices. With all variables in the model, teacher education, grade, and beliefs in developmentally appropriate and inappropriate practice accounted for most of the variance in observed classroom practices. © 2001 Elsevier Science Inc. All rights reserved.

夞This study was conducted with support from the U. S. Department of Education, Office of Special Education Programs, Early Childhood Programs for Children with Disabilities (Grant No. H024Q5001), Early Childhood Follow-Through Research Institute. The opinions expressed here are those of the authors and may not be those of the funding agency. Thanks are extended to the following contributors: Paulette Chetney, Rebecca Blair Gateskill, Janet Hovekamp, Syndee Kraus, Beth Partington, Cynthia Pendergrast, Canby Robinson, Kim Sloper, Brian Sullivan, and Kathy Watkins. We are especially grateful to the teachers and children who welcomed us into their classrooms. * Corresponding author. Tel.: ⫹1-919-966-9865; fax: ⫹1-919-966-7532. Email address: [email protected] (K. L. Maxwell). 0885-2006/01/$ – see front matter © 2001 Elsevier Science Inc. All rights reserved. PII: S 0 8 8 5 - 2 0 0 6 ( 0 1 ) 0 0 1 1 8 - 1

432

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

1. Introduction The phrase “developmentally appropriate practice” has been used to characterize early childhood and primary grade classrooms in which children are engaged meaningfully in learning activities, use hands-on materials to support their learning, and actively construct their knowledge. The teacher in a developmentally appropriate classroom often acts as a facilitator of learning and makes education decisions based on (a) research on child development and learning, (b) knowledge of individual children’s strengths and needs, and (c) knowledge of children’s social and cultural contexts (Bredekamp & Copple, 1997). Many of the ideas described as developmentally appropriate practice come from Piaget and Vygotsky and represent a constructivist perspective of knowledge acquisition (Hart, Burts, & Charlesworth, 1997). The National Association for the Education of Young Children’s (NAEYC) position statement on developmentally appropriate practice (DAP) applies to children birth through 8 years and includes a clear articulation of appropriate practices for each age group of children. However, most measures of and research on the concept of developmentally appropriate practice focus on children birth through kindergarten. Much less work has been done to understand developmentally appropriate practice in the primary grades, and very few observation tools have been created specifically to measure developmentally appropriate practice in the early elementary grades. The few observation tools cited in the school-age literature were either originally intended to be used in preschool settings (e.g., the Classroom Practices Inventory, Hyson, HirshPasek, & Rescorla, 1990) or in kindergarten (e.g., Checklist for Rating Developmentally Appropriate Practice in Kindergarten Classrooms, Charlesworth, Hart, Burts, & Hernandez, 1991) or have little evidence of their psychometric properties (e.g., the Scale of Primary Classroom Practices, Burt, Sugawara, & Wright, 1993). These measures cannot easily be used in 1st-3rd grade classrooms because some of the items are not as applicable to secondand third-grade classrooms. One of the purposes of this study was to develop and test a new observation measure of developmentally appropriate practices to be used in K-3rd grade classrooms. This new measure was designed to assess components of global quality that could be found in any classroom environment in kindergarten, first-, second-, or third-grade. Thus, this new tool measures similar constructs as other DAP measures but the items are designed to maximize applicability across grade levels. Because of our interest in assessing global dimensions of DAP across K-3rd grade classrooms, the measure did not include items to assess specific curriculum content. It is important not only to describe current elementary school classroom practices but also to identify the factors that affect practices. Why are some classrooms more developmentally appropriate than others? One way to address this question is to examine the proportion of variance in classroom practices that can be explained by a set of predictor variables. This approach was used in a recent study to predict the beliefs and practices of K-3rd grade teachers using a variety of classroom structural variables and teacher demographic characteristics (Buchanan, Burts, Bidner, White, & Charlesworth, 1998). Whereas Buchanan and her colleagues were interested in identifying the characteristics that best predicted beliefs and practices, separately, we were interested in understanding

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

433

whether teacher beliefs contributed to the prediction of observed classroom practices above and beyond other classroom and teacher characteristics that may limit (e.g., class size) or influence (e.g., education level) teachers’ practices. Previous research has demonstrated small to modest correlations between teacher beliefs and developmentally appropriate practice (Bryant, Clifford, & Peisner, 1991; Charlesworth et al., 1991; Charlesworth et al., 1993; Oakes & Caruso, 1990; Stipek & Byler, 1997; Vartulli, 1999). Practically, we were interested in understanding the role of teacher beliefs as a potential strategy for changing practices. If beliefs add uniquely to the prediction of observed practices, then it may be possible to change practices by changing beliefs. Although changing beliefs may not be simple, it may be a more feasible strategy than changing structural characteristics such as class size. Previous research has shown the positive effects of changing preschool teacher beliefs and classroom practices (Cassidy, Buell, Pugh-Hoese, & Russell, 1995). Like the study conducted by Buchanan and her colleagues, we included a set of classroom and teacher characteristics as predictors of developmentally appropriate classroom practices. We used as classroom characteristics grade level, class size, and number of students with disabilities. Grade was included in our predictor model because previous research has shown a decline in developmentally appropriate practice as grade level increases (Buchanan et al.1998; Homes & Morrison, 1994; Sherman & Mueller, 1996; Vartulli, 1999). We included class size because the NAEYC guidelines and research from early childhood programs suggest that smaller class size may be related to more developmentally appropriate practice (Bredekamp & Copple, 1997; Frede, 1995). Finally, we included the number of children with disabilities in the class. According to a national survey of elementary school teachers, approximately 62% of general education kindergarten classrooms enroll students with disabilities; the percentage increases to 86% in third-grade classrooms (Early Childhood Follow-Through Research Institute, 1996). With so many inclusive classrooms, it is important to understand classroom practices for children both with and without disabilities. It is also possible that classroom practices differ for inclusive versus noninclusive classrooms. If classroom teachers are responsible for implementing children’s IEPs (Individualized Education Plans), then we would expect to see more individualized (and possibly more developmentally appropriate) practices in classrooms with more children with disabilities (Salisbury et al., 1994). Buchanan et al. (1998) found that teachers of 1st-3rd grade classrooms with fewer children with disabilities reported using more inappropriate activities than teachers who had children with disabilities in their classrooms. For teacher characteristics, we used highest level of education, years of experience teaching K-3rd grade, and teacher beliefs in developmentally appropriate and inappropriate practices. The child care literature suggests that teachers with more education are more likely to implement developmentally appropriate practice (Cassidy et al., 1995; Cost, Quality, and Outcomes Study Team, 1995; Phillips & Howes, 1987; Whitebook, Howes, & Phillips, 1989). The school age literature has not demonstrated such a relationship (Buchanan et al., 1998; Stipek, Daniels, Galluzo, & Milburn, 1992; Vartulli, 1999), possibly because the education level among school teachers is more restricted than that of child care teachers. Although the literature is mixed, we included teacher education because we wanted to determine the effects of teacher beliefs over and above teacher education. It is also reasonable to expect teachers’ years of experience to influence their classroom

434

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

practices. The direction of the relationship, however, is arguable. Teachers with less experience could implement more developmentally appropriate practices because they were trained more recently and were more likely to have been exposed to the concept of developmentally appropriate practice. On the other hand, as teachers learn from their experiences with young children and become more comfortable in their role as teachers, they may adopt more developmentally appropriate practices. Research on the relationship between teacher experience and classroom practices is mixed. Some studies have reported a relationship between experience and developmentally appropriate classroom practice (Dunn, 1993; Vartulli, 1999; Whitebook et al., 1989) while others have not demonstrated a relationship between the two variables (Buchanan et al., 1998; Buysse, Wesley, Bryant, & Gardner, 1999; Cost, Quality, and Outcomes Study Team, 1995). Like education, though, we included experience in the model to control for its effects on the relationship between teacher beliefs and practices. Finally, we included teacher beliefs because previous research has demonstrated a relationship between teacher beliefs and practices in the early elementary grades (Buchanan et al., 1998; Stipek & Byler, 1997). Previous research suggests that teacher beliefs about developmentally appropriate practices are distinct from beliefs about developmentally inappropriate practices (Buchanan et al., 1998; Stipek & Byler, 1997). In other words, teachers may believe in both developmentally appropriate and developmentally inappropriate practices. Thus, both types of beliefs were included in the model. This study was designed to address two major research questions. First, what are the psychometric properties of a new observation measure of developmentally appropriate classroom practices in K-3rd grade? This is the first step in the development process of this new observation measure—to examine its psychometric properties and relationship with other variables. This basic question must be answered adequately before turning to other important issues, like the relationship between this global measure of classroom quality and children’s outcomes, in future research. Second, how well do classroom and teacher characteristics predict classroom practices? We especially were interested in determining whether teacher beliefs would predict practices after controlling for other classroom and teacher characteristics.

2. Method 2.1. Participants Through mail and phone calls, 69 K-3rd grade classroom teachers in 40 public elementary schools in central North Carolina and central Kentucky were recruited. Each had at least one child with disabilities enrolled. The goal was to recruit 40 teachers from North Carolina and 30 from Kentucky; we were 1 teacher short of the goal in Kentucky. Of the 69 classrooms, 49 had children from a single grade and 19 had children from multiple grades. Of the 49 single-grade classrooms, 12 were kindergarten, 14 were first grade, 11 were second grade, and 12 were third grade. All but 1 of the single-grade kindergarten classrooms were full day. Of the 19 multigrade classrooms, 6 were K-1

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

435

Table 1 Participant characteristics for the K–1 and 2–3 grade groupings used in the regression analyses (N ⫽ 60) Grade K–1 (N ⫽ 31) Highest degree Bachelor’s Master’s Years of K–3 experience M Range Class size M Range Number of children with disabilities M Range

Grade 2–3 (N ⫽ 29)

55% 45%

52% 48%

7.2 (1–23)

10.3 (1–30)

22 (17–26)

24 (16–30)

3 (1–7)

3 (1–7)

combinations, 6 were grade 2–3 combinations, 3 were 1–3 grade combinations, 2 were K-3 combinations, 1 was a K-2 combination, and 1 was a 1–2 grade combination. Of the 9 multigrade classrooms that included kindergartners, 3 were half-day programs for kindergarten students. All the multigrade classrooms were from Kentucky. Classrooms of children from multiple grades are more common in Kentucky because the Kentucky Education Reform Act requires elementary schools to create multiage classrooms (Alston et al., 1998). In North Carolina, single-grade classrooms are the norm. The APEEC psychometric properties were investigated using all classrooms. In the regression analyses, we excluded 2 classrooms with missing data and 7 multigrade classrooms that could not be categorized as serving either grades K-1 or 2–3. Approximately half of the teachers (49%) had a master’s degree; the others (51%) had a bachelor’s degree. The teachers varied widely in their years of experience teaching K-3rd grade (M ⫽ 9.3 years, range ⫽ 1–30 years). All teachers were certified to teach elementary school. On average, each class enrolled 23 children (range ⫽ 16 –30 children) and included 3 children with disabilities (range ⫽ 1–7). Descriptive information for the 60 classrooms used in the regression analyses is provided in Table 1. 2.2. Measures 2.2.1. Assessment of Practices in Early Elementary Classrooms The Assessment of Practices in Early Elementary Classrooms (APEEC, Hemmeter, Maxwell, Ault, & Schuster, 2001) was designed to measure individualized and developmentally appropriate practices in K-3rd grade general education classrooms. It is a global measure of classroom quality designed to apply equally to grades K-3rd. As such, it does not include detailed information about the content of instruction. We used the NAEYC guidelines for developmentally appropriate practice (Bredekamp & Copple, 1997) and recommended practices for early childhood special education programs (see Odom & McLean, 1996) as a framework for developing items. Although the concept of developmentally appropriate

436

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

practices is supposed to apply to all children, some in the early childhood special education field have expressed concerns that DAP may not be sufficient for meeting the individual needs of children with disabilities (Carta, Schwartz, Atwater, & McConnell, 1991). In this paper, we use the phrase “individualized and developmentally appropriate practices” periodically to emphasize that the APEEC is intended to measure quality practices for children with and without disabilities. Twenty-one early childhood researchers (70% response rate) and 25 early childhood practitioners (83% response rate) reviewed the initial 40-item measure. Based on the reviewers’ comments and pilot testing of the instrument, we revised the APEEC and reduced the number of items from 40 to 16. Each of the APEEC’s 16 items contains two or more descriptors at each of the 1, 3, 5, and 7 anchors. Each descriptor is scored as yes, no, or n/a. Observers use the descriptor scores to establish a rating of 1–7 for each item. On the 7-point continuum, a score of 1 represents developmentally inappropriate practice and a score of 7 represents ideal developmentally appropriate practice. The majority of descriptors are scored on the basis of observation. Teacher report may be used in a few instances when the descriptor is not observed. This use of observation as the primary data source, with teacher report used only as needed, is similar to other classroom observation measures such as the Early Childhood Rating Scale-Revised (ECERS-R, Harms, Clifford, & Cryer, 1998). Almost all APEEC items are applicable to practices for all children, with and without disabilities. One item applies specifically to children with disabilities (i.e., participation of children with disabilities in classroom activities). The mean total score (i.e., all items summed and divided by 16) was used in the analyses. Psychometric properties, described in the results section, confirm that the total score was internally consistent and represented developmentally appropriate classroom practices. The descriptive title for each APEEC item is included in Table 2 and four sample APEEC items are included in the appendix. 2.2.2. Assessment Profile for Early Childhood Programs The Assessment Profile for Early Childhood Programs—Research Version (Abbott-Shim & Sibley, 1998) is an observation checklist designed to measure early childhood classrooms and teaching practices that support children’s development. The scale consists of 60 yes-no items covering 5 scales of developmentally appropriate practice: learning environment, scheduling, curriculum, interacting, and individualizing. Correlations between Profile scores and scores from another observation measure of preschool classroom practices, the Early Childhood Environment Rating Scale (Harms & Clifford, 1980), have been found to range from 0.65 to 0.75, demonstrating the construct validity of the Profile scores as measuring global quality (Abbott-Shim, Sibley, & Neel, 1992). Although the Profile is used primarily in preschool and kindergarten classrooms, it recently has been used in primary grades as well (Abbott-Shim, Sibley, & McCarty, 1997; Abbott-Shim, Sibley, & Neel, 1998; Huffman & Speer, 2000; Sherman & Mueller, 1996). We included the Profile in our validity study of the APEEC because (a) the Profile, like the APEEC, is an observational measure, (b) the Profile has been used recently in the primary grades, and (c) evidence supports the construct validity of the Profile scores as a measure of global classroom quality. A high positive correlation between these two measures of the same theoretical construct using the same measurement method would provide evidence of

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

437

Table 2 Mean scores for each APEEC item (N ⫽ 69) and mean interrater agreement indices for each APEEC item (N ⫽ 59) APEEC item

Ma

SD

Range

% Exact Agree

% Agree within 1 point

Weighted kappa

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.

3.9 4.4 4.9 2.3 4.3 4.6 3.3 4.2 3.8 2.8 3.7 3.4

1.6 1.6 1.5 1.3 1.7 1.7 1.3 1.7 1.4 1.4 1.7 1.4

2–7 1–7 1–7 1–7 1–7 1–7 1–7 1–7 1–7 1–6 2–7 1–7

59 61 36 86 41 66 53 39 68 63 63 58

78 86 75 95 75 85 78 73 86 85 90 80

.62 .67 .39 .78 .53 .68 .47 .48 .67 .61 .72 .55

3.6 2.2 4.0 3.6

1.9 1.4 1.6 1.8

1–7 1–6 2–7 2–7

58 63 41 71

73 83 69 80

.58 .58 .41 .66

Room arrangement Display of child products Classroom accessibility Health and classroom safety Use of materials Use of computers Monitoring child progress Teacher-child language Instructional methods Integration and breadth of subjects Children’s role in decision-making Participation of children with disabilities in classroom activities 13. Social skills 14. Diversity 15. Appropriate transitions 16. Family involvement a

Range of possible scores ⫽ 1–7

the construct validity of APEEC scores as a measure of global classroom quality (Crocker & Algina, 1986). Using data from the current validity study, the mean interrater agreement for the Profile was 82% (range ⫽ 58 –97%). In our analysis, we used a proportion score (i.e., summing the items scored yes and then dividing by 60). Because the Profile item data are dichotomous, we used the Kuder Richardson 20 to calculate the internal consistency of the Profile total score. The KR20 for the total scores in this study was 0.89, indicating that the Profile total score was internally consistent. 2.2.3. Caregiver Interaction Scale The Caregiver Interaction Scale (CIS; Arnett, 1989) is an observation rating measure of teacher interactions with children. The scale consists of 26 items measuring the teacher’s sensitivity, harshness, detachment, and permissiveness. Each item is rated on a 4-point Likert scale indicating the extent to which the teacher demonstrates particular characteristics, from not at all (1) to very much (4). Ratings on this measure have been associated with higher quality classroom practices in preschool (Peisner-Feinberg & Burchinal, 1997). Because the teacher-child relationship is an important component of DAP, we expected the CIS to be related to a global quality measure of K-3rd grade. We, therefore, included it in the validity study of the APEEC. A high, positive correlation between the APEEC and the CIS would provide evidence that the APEEC is measuring at least some component of DAP—particularly the teacher-child relationship. Using data from the current validity study, interrater agreement within 1 point for the CIS was 100% across classrooms; exact interrater agreement ranged from 27% to 85%, with a mean of 60%. We used in the analysis the CIS total

438

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

mean score (i.e., reversing negatively worded items then summing all items and dividing by 26). Using data from our study, the CIS scores demonstrated high internal consistency (␣ for total mean score ⫽ 0.93). 2.2.4. Teacher Beliefs and Practices Scale The kindergarten and primary grade versions of the Teacher Beliefs and Practices Scale were designed to measure (a) teacher beliefs about developmentally appropriate and inappropriate practices and (b) teacher-reported frequency ratings of particular practices or activities representing developmentally appropriate and inappropriate practices. The authors of the instrument used the NAEYC guidelines for developmentally appropriate practice (Bredekamp, 1987) to define appropriate and inappropriate classroom beliefs and practices. The Teacher Beliefs and Practices Scale—Kindergarten Version (TBPS-K, Charlesworth et al., 1993) consists of a 36-item measure of teacher beliefs and a 34-item measure of teacher practices (see also Charlesworth et al., 1991, for information on instrument development and psychometric properties). This scale was used with kindergarten teachers. The Teacher Beliefs and Practices Scale—Primary Grade Version (TBPS-P, Buchanan et al., 1998) consists of a 42-item measure of teacher beliefs and a 33-item measure of teacher practices (see Buchanan et al., 1998, for information on instrument development and psychometric properties). This scale was used by 1st-3rd grade teachers and teachers who taught K-1 multigrade classrooms. All items are rated on a 5-point Likert scale. In previous studies, TBPS scores have been related to grade level, class size, teacher education, and children’s stress behaviors (Buchanan et al., 1998; Burts, Hart, Charlesworth, Fleege, Mosley, & Thomasson, 1992; Cassidy et al., 1995). We included in the analyses only the teacher belief and practice items that were the same on both versions of the TBPS. We created three summary scores by summing these items: developmentally appropriate beliefs (TBPS-DA Beliefs; 14 items), developmentally inappropriate beliefs (TBPS-DI Beliefs; 9 items), and developmentally appropriate practices (TBPS-DA Practices; 17 items). Using data from our study, these summary scores demonstrated moderate internal consistency (TBPS-DA Beliefs ␣ ⫽ 0.78, TBPS-DI Beliefs ␣ ⫽ 0.72, TBPS-DA Practices ␣ ⫽ 0.73). 2.2.5. Teacher demographic questionnaire Each participating teacher completed a 20-item background questionnaire about their schools (e.g., existence of a site-based management team), classrooms (e.g., number of children enrolled), and themselves (e.g., education, years of experience). For this study, only the teachers’ highest level of education, years of experience teaching K-3rd grade, class size, and number of children with disabilities enrolled in each classroom were used. 2.3. Procedure Data were collected in the spring. We mailed participants the teacher demographic questionnaire and the TBPS-K or the TBPS-P, asking them to return the completed forms to us before the scheduled classroom observation. Pairs of trained observers spent an average of 5 hours and 45 minutes (range ⫽ 3 hours to 7 hours and 15 minutes) in each classroom

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

439

during one day to complete three observational rating measures. Each participating teacher received a gift worth approximately $25. 2.3.1. Training Before training other data collectors, the North Carolina and Kentucky trainers (N ⫽ 2) established interrater agreement between themselves on all observation measures. For each measure, their scores were within one point of each other on at least 80% of the items. Each trainer then trained data collectors (N ⫽ 5 in NC; N ⫽ 4 in KY) to that same 80% criterion. Throughout each site’s training and data collection phases, questions and clarifications were discussed among the two trainers to ensure consistency across sites. 2.4. Data analysis Two sets of analyses were conducted. First, we examined the psychometric properties of APEEC scores, calculating interrater agreement at the descriptor, item, and total-score levels. For each item, we also calculated weighted kappa, which measures interrater agreement and accounts for chance agreement as well as the extent of disagreement between observers (e.g., a 1-point vs. a 4-point disagreement; Cohen, 1968). To examine the validity of the APEEC scores, we calculated Pearson correlations between the APEEC total mean score and the Profile and CIS total scores. Second, through hierarchical regression analyses we examined the ability of classroom characteristics, teacher characteristics, and teacher beliefs to predict observed classroom practices.

3. Results 3.1. APEEC psychometric properties Descriptive statistics for each item on the APEEC are provided in Table 2. Scores for 10 of the 16 items fell across the whole 7-point range of the scale; scores for the remaining items fell across 6 points of the scale. The mean score for each item generally fell in the midrange of the scale (2.2 - 4.6), and the standard deviation for each item was always greater than 1. We analyzed interrater agreement for the APEEC in 59 classrooms. Interrater agreement was high at the descriptor, item, and total score levels. At the descriptor level, the percentage agreement among two observers across all the 135 descriptors averaged 86% (range ⫽ 76 –93%). At the item level, the average exact percentage agreement was 58% (range ⫽ 31– 81%); the average percentage agreement within 1 point was 81% (range ⫽ 50 –100%). The weighted kappas for 15 of the 16 APEEC items were above 0.40. Weighted kappas were greater than 0.60 for 8 of the 16 APEEC items. The median weighted kappa was 0.59. Weighted kappas of 0.41 to 0.60 can be considered moderate; kappas greater than 0.60 can be considered substantial (Landis & Koch, 1977). Table 2 presents the weighted kappas as well as percentages for exact agreement and agreement within 1 point for each item. These data suggest that a high level of interrater agreement can be established with the APEEC.

440

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

Analysis of these APEEC data also suggest that the items were internally consistent (␣ ⫽ 0.77). We established construct validity by comparing the APEEC with another observation measure of developmentally appropriate practice, the Profile, as well as with a teacher-report measure of developmentally appropriate practice, the TBPS. Finally, we compared the APEEC to a measure of teacher-child interactions, the CIS. To minimize dependency among the observational data, measures included in the correlational analysis were completed by different observers. Some measures were excluded from analysis due to missing data, resulting in varying numbers of classrooms in each analysis. The Pearson correlation between the APEEC and Profile total scores was 0.67 (N ⫽ 69). The correlation between the APEEC and TBPS-DA Practices scores was 0.55 (N ⫽ 68). The correlation between the APEEC and CIS total scores was 0.61 (N ⫽ 66). These moderately high correlations suggest that APEEC scores are a valid measure of developmentally appropriate practice. 3.2. Predictors of classroom quality For the next set of analysis, we used hierarchical multiple regression to determine the extent to which observed classroom practices could be predicted by classroom and teacher characteristics. Class size, number of students with disabilities, and grade level (coded into two groups: K-1 vs. 2–3) constituted the classroom characteristics. The highest level of education (BA vs. MA), years of experience teaching K-3rd grade, beliefs in developmentally appropriate practice (TBPS-DA Beliefs), and beliefs in developmentally inappropriate practice (TBPS-DI Beliefs) constituted the teacher characteristics. Because we were especially interested in determining the extent to which teacher beliefs added to the prediction of classroom practices over and above the other characteristics, we separated teacher beliefs from the other teacher characteristics and added them as the third step in the model. Classroom characteristics were entered first. Teacher characteristics were entered next, followed by teacher beliefs. A second regression was run to determine if there were interaction effects among the independent variables that accounted for a noteworthy and statistically significant portion of the variance in classroom practices. These interactions were added as the last (fourth) step in the regression model. None of the interactions accounted for a statistically significant or noteworthy portion of the variance and are not reported. The ␤ reported for each variable is as if that variable were entered last in that step. Each step accounts for the variables entered in the previous step(s). The change in R2 and its associated p value are reported for each step. Following the advice of the American Psychological Association’s Task Force on Statistical Inference, information about both statistical and practical significance is reported (Wilkinson et al., 1999). Correlations among the independent and dependent variables are shown in Table 3. Correlations of 0.30 or higher are considered noteworthy and are statistically significant at p ⬍ .05. Classroom characteristics, teacher characteristics, and teacher beliefs accounted for 42% of the variance in observed classroom practices, as measured by the APEEC (see Table 4). In the first step of the model, classroom characteristics accounted for 13% of the variance, with grade accounting for most (10%) of the variance. Classrooms in K-1st grade were

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

441

Table 3 Pearson correlations among classroom practices, classroom characteristics, teacher characteristics, and teacher beliefs

1. 2. 3. 4. 5. 6. 7. 8.

APEEC Grade Class size No. of students with disabilities Teacher education level Teacher years of experience TBPS-DA beliefs TBPS-DI beliefs

1

2

3

4

5

6

7

⫺.34 ⫺.12 .09 .43 .06 .41 ⫺.43

.27 ⫺.03 .03 .22 ⫺.28 .23

.12 ⫺.11 .14 ⫺.07 .10

.27 .22 ⫺.17 .11

.25 .19 ⫺.22

⫺.05 .04

⫺.35

Note: Correlations of .30 or higher are statistically significant at p ⬍ .05.

observed to be more individualized and developmentally appropriate, as measured by the APEEC, than classrooms in 2nd-3rd grade. The variables in this first step did not account for a statistically significant portion of the variance in APEEC scores. When teacher characteristics were added to the model, they explained an additional, statistically significant 19% of the variance. Teacher education uniquely accounted for 17% of the APEEC variance; years of experience did not account for any notable or statistically significant part of the variance. Classrooms taught by teachers with a master’s degree were observed to be more developmentally appropriate, as measured by the APEEC, than those taught by teachers with less education. After accounting for classroom characteristics and other teacher characteristics, teachers’ beliefs about developmentally appropriate and developmentally inappropriate practice explained an additional 11% of the variance in observed classroom practices. This change in R2 was notable and statistically significant. Classrooms of teachers who reported beliefs Table 4 Summary of hierarchical regression analysis for variables predicting the APEEC total mean (N ⫽ 60)

1. Classroom Characteristics Grade (K–1 v 2–3) Class size Number of children with disabilities 2. Teacher Characteristics Teacher education (BA v MA) Years of experience teaching K–3 3. Teacher Beliefs

R2

R2 inc.

.13

.13

.31*** .42*** Adj. R2 ⫽ .34

SE B a

␤a

sr2a

tb

⫺.49 .00 .00

.20 .04 .05

⫺.33 ⫺.04 .09

.10 .002 .008

⫺2.0* .06 .46

.67 .00

.18 .01

.45 .04

.17 .001

2.6* .39

.00 .00

.02 .02

.21 ⫺.24

.035 .05

1.8 ⫺2.1*

.19** .11*

TBPS-DA Beliefs TBPS-DI Beliefs Note: * p ⬍ .05, ** p ⬍ .01, *** p ⬍ .001 Reported for the first step in which the variable was entered. b The t statistic is reported for the full model. a

Ba

442

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

more congruent with developmentally appropriate practice were observed to be more developmentally appropriate, as measured by the APEEC, than those whose beliefs were less developmentally appropriate. Classrooms of teachers who reported beliefs more congruent with developmentally inappropriate practice were observed to be less developmentally appropriate than those whose beliefs were not as developmentally inappropriate. The full model accounted for a statistically significant and noteworthy proportion of the variance in observed classroom practices, and each step accounted for a considerable amount of variance after controlling for the variables entered in the preceding steps (although the first step did not account for a statistically significant amount of variance). The squared semipartial correlations from the full model (not shown in Table 4) demonstrate that most of the APEEC unique variance (22%) was accounted for by education level (8%), TBPS-DI Beliefs (5%), grade (5%), and TBPS-DA Beliefs (4%). All but TBPS-DA Beliefs were statistically significant predictors in the full model (see t statistics in Table 4). It is likely that TBPS-DA Beliefs was not statistically significant in the full model because of the collinearity between it and TBPS-DI Beliefs (r ⫽ ⫺0.35). To test this, we reran the analysis entering just TBPS-DA Beliefs, not TBPS-DI Beliefs, in the final step and found that in the full model TBPS-DA Beliefs statistically significantly predicted 6% of APEEC unique variance (t ⫽ 2.1, p ⫽ .04). We also reran the analysis entering just TBPS-DI Beliefs in the final step and found that in the full model TBPS-DI Beliefs statistically significantly predicted 7% of the APEEC unique variance (t ⫽ ⫺2.4, p ⫽ .02). Thus, both developmentally appropriate and developmentally inappropriate beliefs predicted observed classroom practices.

4. Discussion 4.1. Predicting developmentally appropriate practices This study demonstrates that classroom characteristics, teacher characteristics, and teacher beliefs account for almost half the variance in observed classroom practices. This is a much larger effect than that reported by Buchanan et al. (1998), whose model accounted for 5% of the variance in teacher-reported developmentally appropriate classroom practices. Our study’s use of an observational measure of practice as well as the inclusion of classrooms from multiple districts and a higher proportion of teachers with master’s degrees could account for the difference. It is encouraging to find a model that accounts for a high proportion of the variance in developmentally appropriate classroom practices. By identifying the factors that predict practices, we can be more effective in improving classroom practices. However, there is still a large amount of unexplained variance. Factors not included in this study, such as adult: student ratio or beliefs about how students learn, may help explain greater amounts of variance in classroom practices. 4.2. Important predictors of classroom practices Grade level, education, and teacher beliefs were the strongest predictors of observed classroom practices. In this study, about one-fourth of the variance was uniquely explained

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

443

by simply knowing whether a classroom was either K-1st grade or 2nd-3rd grade. As in previous studies, classrooms were less developmentally appropriate as grade increased (Buchanan et al., 1998; Holmes & Morrison, 1994; Vartulli, 1999). It is possible that the decline in developmentally appropriate practices was due to measurement issues rather than grade itself. However, we designed each item to reflect developmentally appropriate practices that would apply equally across the K-3rd grade levels. Thus, it is reasonable to conclude that grade level— or another factor associated with grade level (e.g., increased emphasis on end-of-grade testing)—is related to developmentally appropriate classroom practices. Additional research is needed to understand better the mechanisms through which grade level impacts classroom practices. Notably, teacher beliefs predicted classroom practices even after controlling for the effects of grade and education. The implications for this finding are mixed. On the one hand, it provides hopeful direction for improving practices. Although it may be difficult to minimize the grade level pressures that constrain developmentally appropriate practice (e.g., end of grade testing), it may be possible to improve practices by changing teachers’ beliefs. Previous research has shown the positive effects of changing preschool teacher beliefs and classroom practices (Cassidy et al., 1995). Teacher beliefs in both developmentally appropriate and inappropriate practice predicted similar amounts of variance in classroom practices. This supports earlier research suggesting that developmentally inappropriate beliefs are a separate dimension from developmentally appropriate beliefs rather than the opposite end of the same continuum (Buchanan et al., 1998; Charlesworth et al., 1993). It also suggests that any strategy to change beliefs should address both dimensions. On the other hand, it is somewhat disheartening to consider the likelihood of changing teacher beliefs. Changing beliefs is difficult and time consuming. Although one would like to believe that changing beliefs will lead to a change in practice, previous research has demonstrated the opposite— changing practice will lead to a change in beliefs (Guskey, 1986). It is also sobering to realize that education and beliefs are not so closely tied together. Teacher beliefs predicted classroom practices independent of education level. Thus, improving education may not in itself bring about large changes in practices. The importance of education should not be ignored, however. Teacher education mattered. The findings from this study add to the body of evidence suggesting that general education level is related to classroom practices in the elementary grades. Of the three important predictors (grade, education, beliefs), education seems the most amenable to change. Finally, some variables thought to be important predictors of developmentally appropriate practice did not account for much variance. In both our study and the Buchanan et al. (1998) study, class size did not predict developmentally appropriate practices. It is possible that class size has to be considerably smaller to affect practices. The average class size in both studies was higher than the class size of 18 recommended by the U. S. Department of Education (http://www.ed.gov/offices/OESE/ClassSize). We also did not find a relationship between the number of children with disabilities and developmentally appropriate practices. The Buchanan et al. (1998) study found that fewer children with disabilities in a classroom was related to teacher-reported developmentally inappropriate activities but was not associated with appropriate activities. We need more information about the relationship between classroom practices and the inclusion of children with disabilities.

444

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

4.3. The APEEC The results of this study suggest that the APEEC is a valid measure of individualized and developmentally appropriate practice in K-3rd grade inclusive classrooms. The APEEC is one of a few observation measures designed specifically to measure developmentally appropriate classroom practices for older children (i.e., 5– 8 years). To our knowledge, it is the only such measure designed specifically to be used in general education classrooms that include children with disabilities, as is common in many public schools (Early Childhood Follow-Through Research Institute, 1996). Both researchers and practitioners should find the APEEC useful for describing and better understanding classroom practices in the early elementary grades. The APEEC provides practitioners a starting point for discussions about current practices and ways of improving practices. This is the first report of the development of a new instrument. As such, the findings from this study should be interpreted cautiously. Evidence of the APEEC’s discriminant validity have not been gathered yet. Also, data from more classrooms are needed before we can understand the factor structure of the APEEC. Now that there are data to support the reliability and validity of APEEC scores, it will be important to determine the relationship between the APEEC and children’s academic and social outcomes. Finally, as a global measure of quality, the APEEC does not measure all of the important factors of classroom quality such as the curriculum content or particular instructional strategies. Researchers and practitioners are encouraged to supplement the APEEC with other tools. For instance, Fennema et al. (1996) described procedures used to assess math instruction. Additionally, the MS-CISSAR documents the frequency of particular teacher behaviors and classroom activities (see Greenwood, Carta, & Dawson, 2000 for a review). Measures of both the global quality and the particular instruction are needed to understand how best to support children’s development and learning in the early elementary grades. Global quality measures can be helpful in describing classroom practices across grade levels. Measures of the specific content covered in classrooms are also needed in order to understand how both the global quality and the curriculum and instruction affect children’s learning. Better tools are needed to assess how children are taught particular skills and learn specific subject matter across the early elementary grades. Developing these tools will require an interdisciplinary approach that uses knowledge from developmental, cognitive, and educational psychology as well as early childhood education. 4.4. Implications This study suggests three major practical implications about developmentally appropriate practices in the early elementary grades. First, teachers, supervisors, preservice educators, and in-service trainers now have a new tool to assess the quality of K-3rd grade classrooms. The APEEC would be a valuable part of a comprehensive teacher training or evaluation system to provide information about the global quality of classroom practices and as a possible way to structure supervisors’ classroom observations. Special educators and general education teachers could also use the APEEC to understand better the quality of classrooms that include children with disabilities. Some of the APEEC descriptors could help specialists

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

445

identify particular areas to address when collaborating with general educators to best serve students with disabilities. Although not originally designed to be used in self-contained special education settings, the APEEC has been used successfully as a global measure of classroom quality in these classrooms as well (Symons, Clark, Roberts, & Bailey, 2000). In the Symons et al. study, classroom quality as measured by the APEEC was moderately associated with student engagement. Using the APEEC in both general education and special education settings could strengthen the quality of services for special education students. Second, teacher beliefs are important in understanding classroom practices. Teacher training programs should address directly the beliefs of teachers in training and not presume that education about developmentally appropriate practice will change beliefs. The same is true for in-service professional development activities. Third, education matters. We should support educational policies and hiring practices to ensure that teachers in the early elementary grades have master’s degrees or higher. Findings from the Vartulli (1999) study also suggest that we must ensure that these teachers have an early childhood education. Recruiting and keeping highly educated teachers likely will require complex solutions, including increased compensation. These findings about education and beliefs provide a needed buoy in the sea of critics who claim that developmentally appropriate practice is too difficult to implement in 2nd or 3rd grade. Even with all the factors associated with grade level, teachers in 2nd and 3rd grade who had a master’s degree or higher and who believed in developmentally appropriate practices were more likely to implement developmentally appropriate practice. The findings from this study also have theoretical implications. Everyone in our field agrees on the need for a high quality education. Far fewer agree on the definition of “high quality.” The National Association for the Education of Young Children has described their view of high quality education, called “developmentally appropriate practice,” for young children birth through 8 years (Bredekamp & Copple, 1997). Research on developmentally appropriate practice poses many questions about our understanding of this concept. The findings from this and other studies have shown consistently that classroom practices are not as developmentally appropriate (i.e., of high quality) as many teachers and early childhood proponents would like. Additionally, the implementation of developmentally appropriate practices seems to diminish across grade level. Is this decline a measurement artifact in that measurement developers have not been able to define and measure adequately the concept of developmentally appropriate practice at the upper ends of the age continuum? Or does it really become difficult to implement developmentally appropriate practices in 3rd grade? How well does the concept apply beyond kindergarten? Finally, how do high quality classroom practices for children 9 and older differ from high quality (i.e., developmentally appropriate) classroom practices for children 8 and younger? As we work to improve education and, more specifically, implement the concept of developmentally appropriate practice, it may help to ponder these questions. The last question about how high quality practices differ for older versus younger children seems particularly helpful in understanding developmentally appropriate practices. Many guidelines for developmentally appropriate practice seem to apply equally to children of all ages. Would learners of all ages not benefit from being actively engaged, trying things out for themselves (i.e., using hands-on materials), playing an active role in their own learning,

446

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

or having help to see how a new idea or concept relates to their lives? By marketing the concept of developmentally appropriate practice as one that applies only to young children, supporters of developmentally appropriate practice may inadvertently be doing more harm than good. People may dismiss the ideas as relevant only for very young children when, in fact, many of the principles apply to children and adults of all ages. Instead of grappling with the broad concept of developmentally appropriate practice, it might be more beneficial to focus attention and research on some of the key components of developmentally appropriate practice that apply to children of all ages. This narrowing of focus may, in fact, help us better understand the concept of developmentally appropriate practice– or articulate new concepts. Researchers and practitioners will need to work together to move the field forward in its thinking about developmentally appropriate classroom practices in the early elementary grades beyond kindergarten.

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

447

Appendix: Sample APEEC items 5. Use of materials (observation and interview) Inadequate 1

2

1.1 Minimal hands-on materialsa are in the classroom. (O)

1.2 All activities are paper and pencil tasks. (O)

a

Minimal 3

4

3.1 Hands-on materialsa are used in at least one subject area to appropriately support child learning.b* (O) 3.2 The teacher ensures that children use materials properly (e.g., teacher shows children how to use a microscope, helps children learn rules of game, reminds children of proper use).* (O)

Good 5

6

Excellent 7

5.1 Many different hands-on materialsa in at least two subject areas are in the classroom. (O)

7.1 All children use hands-on materialsa for a majority of the day. (O)

5.2 Hands-on materials or other relevant materialsc are used by most children in at least two subject areas to appropriately support child learning.b (O)

7.2 Hands-on and other relevant materialsc are used by most children in all subject areas to appropriately support child learning.b* (O, I)

Examples of hands-on materials: art supplies, games, coins, blocks, unifix cubes, scales, three-dimensional shapes, counters, rulers, puppets, plants. b Examples of how materials are used to support child learning: math cubes used for solving math problems, art materials used for creativity, scales used for testing hypotheses about weights of objects, blocks and interlocking pieces to learn basic building and physics concepts, math games to teach relevant math concepts, and live animals and plants to teach growth. Because different materials may be needed to support some children’s learning (e.g., children with lower level math skills need less difficult math games), look for materials at various skill levels. To be considered as supporting children’s learning, teachers should redirect children to more appropriate materials, if they are using an inappropriate material; or there is evidence in the lesson plans that the materials used were selected for a specific purpose. c Other relevant materials include a variety of children’s books (e.g., children’s literature, library books, fiction and non-fiction books) and paper and pencil when they are used to foster activities about children’s real-life experiences (e.g., teaching writing skills by asking children to write creative stories, make journal entries, or write poetry as opposed to copying sentences). * Descriptor 3.1 Score as “true” even if only a few children use hands-on learning materials. * Descriptor 3.2 Score as “true” if children are using materials properly, even without teacher intervention. Score as “not true” if children never use hands-on materials. * Descriptor 7.2 All subject areas include math, language arts, science, and social studies. To score this descriptor as “true,” the teacher must give at least two examples of children’s use of materials in each subject area.

448

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

APEEC Item 8. Teacher-child language (observation) Inadequate 1

2

1.1 Almost all teacher questions have one correct answer or require rote memorization of facts. (O) 1.2 Almost all child language is teacherdirected (e.g., teacher chooses topic, children speak primarily in response to teacher). (O)

Minimal 3

4

3.1 Teacher shows interest in children’s statements or questions. (O)

3.2 The teacher’s feedback to children is constructive, not critical. (O)

3.3 Children have some opportunities to talk with their peers about classroom activities. (O)

Good 5

6

5.1 Some teacher questions require something other than one correct answer or rote memorization of facts.a (O) 5.2 At least a few times a day, the teacher prompts children to elaborateb on their initial statements. (O)

Excellent 7 7.1 Many times a day, the teacher prompts children to elaborateb on their initial statements. (O) 7.2 The teacher has some informal conversationsc with children. (O)

5.3 Children have many opportunities to talk with their peers about classroom activities. (O)

a Examples of teacher questions requiring more than one correct answer: What do you think will happen next? How could you solve this problem? What are some ways we can add numbers to make 10? What words start with the letter L? If you lived in 1900, what would your life have been like? b Elaboration requires the teacher to ask follow-up questions of a child to elicit additional statements from him or her. Asking multiple questions to a group of children is not considered elaboration. Examples of elaborations: Teacher: What is the answer to this question? Child: 4. Teacher: How did you know that?; Teacher: What did Joe do next in the story? Child: He ran away. Teacher: Why do you think he ran away? c A conversation is not simply asking and answering a question, giving directions, or clarifying a task.

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

449

APEEC Item 9. Instructional methods (observation) Inadequate 1 1.1 Whole group instruction is used all day. (O)

2

Minimal 3

4

Good 5

6

3.1 The teacher uses at least two different teaching methods.a (O)

5.1 Shared learninga is used at least once a day. (O)

3.2 Some activities or materials are adaptedb for individual children as needed.* (O)

5.2 Most activities are adaptedb for individual children as needed. (O)

Excellent 7 7.1 The teacher uses at least two teaching methods within at least two subject areas.a (O) 7.2 The teach facilitates group discussionsc among children. (O)

5.3 The teacher asks children to explain their answers at least a few times a day.* (O) a

Teaching methods include whole group instruction (e.g., lecturing, giving directions, giving feedback to children during teacher-directed activities in which all children are working on the same thing, demonstrating new tasks in a large group setting), small group instruction (e.g., teacher-led reading groups), one-on-one instruction (e.g., teacher works with an individual child), self-instruction (e.g., children directing their own play with materials, reading alone, working on an educational computer program), teacher facilitation (e.g., teacher expands on child-directed activities), and shared learning in which children work together to complete an activity (e.g., cooperative learning, games, peer tutoring). b Examples of material and activity adaptations: alternate keyboards for children with physical disabilities, reading materials available for children at different reading levels, large print materials for children with visual impairments, shorter assignments for children with developmental delays, peer assistance, ability-based reading groups, materials available in children’s primary language. c Group discussions go beyond the teacher asking and children answering questions. In group discussions, children present their opinions, consider different issues of a problem, talk about pros and cons, and so forth. No one person (e.g., teacher, child) is the primary source of information during group discussions. Group discussions may be among the whole class or a smaller group of children. * Descriptor 3.2, 5.2 This item is based on observation only. Pay attention to the number of materials that are adapted as needed. It is not enough to adapt just one material if, in fact, several need to be adapted. Also remember that adaptations need to be made for any child whose skills are above or below the level required by the material, not just children with disabilities. Ability-based groups should be considered as an adapted activity. * Descriptor 5.3 Remember that asking for an explanation is always an elaboration, but elaborations are not always explanations.

450

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

APEEC Item 11. Children’s role in decision-making (observation and interview) Inadequate 1

2

Minimal 3

4

1.1 Children never make choicesa about their classroom activities. (O)

3.1 Children make choicesa in the classroom at least twice a day. (O)

1.2 Children never choose whom they sit by, work or play with in the classroom. (O)

3.2 Children choose whom they sit by, work or play with in the classroom at least twice a day. (O)

Good 5

6

5.1 At least once a day, children decide which activity to do (e.g., choose an activity in a center, decide between writing or playing a math game).* (O) 5.2 Children help make at least three different decisions that affect the entire class or a group of children in the class.b (O, I)

Excellent 7 7.1 Children make choicesa many times a day. (O)

7.2 Children help make decisionsb at least once a month that affect the entire class or a group of children in the class. (I)

a

Child choice may include choices between activities (e.g., whether to draw a picture or read a book) or between teacher-identified options (e.g., children must write a story, but they can decide the topic; children can select a math game to play or a book to read after completing work). b The intent of this is for children to make decisions together. Do not include decisions that one child makes that may affect the entire class or group of children (e.g., at lunch one child allowed to choose four friends to eat with). Examples of decisions made by a group: class rules, topics of study, field trips, books to be read aloud, projects to complete, games to play. * Descriptor 5.1 This does not include choosing “filler” activities until it is time to move to the next major activity. * Note: The sample items in this appendix were reprinted by permission of the publisher from Hemmeter M, Maxwell K, Ault M, Schuster J, Assessment Practices in Early Elementary Classrooms, (New York: Teachers College Press, ©2001 by Teachers College, Columbia University. All rights reserved.), pp. 13, 16, 17, & 19. To order, please call 800-575-6566 or www.teacherscollegepress.com

References Abbott-Shim, M. Sibley, A., & McCarty, F. (1997). Developmentally appropriate practices across the grade levels.Anaheim, CA: The National Association for the Education of Young Children Conference. Abbott-Shim, M., Sibley, A., & Neel, J. (1998). Psychometric report of the assessment profile for early childhood programs: research version for the national transition demonstration project. Atlanta, GA: Quality Assist. Abbott-Shim, M., & Sibley, A. (1998). Assessment profile for early childhood programs: research version. Atlanta, GA: Quality Assist. Abbott-Shim, M., Sibley, A., & Neel, J. (1992). Assessment profile for early childhood programs: research manual. Atlanta, GA: Quality Assist. Alston, E., Brinly, B., Carr, A., Deaton, S., Dutton, P., Little, D., & Steinberg, E. (1998). Kentucky education reform act: a citizen’s handbook. Frankfort, KY: Legislative Research Commission. Arnett, J. (1989). Caregivers in day-care centers: does training matter? Journal of Applied Developmental Psychology, 10, 541–552. Bredekamp, S. (1987). Developmentally appropriate practice in early childhood programs serving children from birth through age 8. Washington, DC: National Association for the Education of Young Children. Bredekamp, S., & Copple, C. (1997). Developmentally appropriate practice in early childhood programs. Washington, DC: National Association for the Education of Young Children.

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

451

Bryant, D. M., Clifford, R. M., & Peisner, E. S. (1991). Best practices for beginners: developmental appropriateness in kindergarten. American Educational Research Journal, 28, 783– 803. Buchanan, T. K., Burts, D. C., Bidner, J., White, F., & Charlesworth, R. (1998). Predictors of the developmentally appropriateness of the beliefs and practices of first, second, and third grade teachers. Early Childhood Research Quarterly, 13, 459 – 483. Burt, L. M., Sugawara, A. I., & Wright, D. (1993). A scale of primary classroom practices (SPCP). Early Child Development and Care, 84, 19 –36. Burts, D. C., Hart, C. H., Charlesworth, R., Fleege, P. O., Mosley, J., & Thomasson, R. H. (1992). Observed activities and stress behaviors of children in developmentally appropriate and inappropriate kindergarten classrooms. Early Childhood Research Quarterly, 7, 297–318. Buysse, V., Wesley, P. W., Bryant, D., & Gardner, D. (1999). Quality of early childhood programs in inclusive and noninclusive settings. Exceptional Children, 65, 301–314. Carta, J. J., Schwartz, I. S., Atwater, J. B., & McConnell, S. R. (1991). Developmentally appropriate practice: appraising its usefulness for young children with disabilities. Topics in Early Childhood Special Education, 11, 1–20. Cassidy, D. J., Buell, M. J., Pugh-Hoese, S., & Russell, S. (1995). The effect of education on child care teachers’ beliefs and classroom quality: year one evaluation of the TEACH Early Childhood Associate Degree Scholarship Program. Early Childhood Research Quarterly, 10, 171–183. Charlesworth, R., Hart, C. H., Burts, D. C., & Hernandez, S. (1991). Kindergarten teachers’ beliefs and practices. Early Child Development and Care, 70, 17–35. Charlesworth, R., Hart, C. H., Burts, D. C., Thomasson, R. H., Mosley, J., & Fleege, P. O. (1993). Measuring the developmental appropriateness of kindergarten teachers. Early Childhood Research Quarterly, 8, 255–276. Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220. Cost, Quality, & Outcomes Study Team. (1995). Cost, quality, and child outcomes in child care centers, technical report. Denver: Department of Economics, Center for Research in Economic and Social Policy, University of Colorado at Denver. Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart and Winston. Dunn, L. (1993). Proximal and distal features of day care quality and children’s development. Early Childhood Research Quarterly, 8, 167–192. Early Childhood Follow-Through Research Institute (1996). National surveys of practices in early elementary schools. Chapel Hill, NC: Frank Porter Graham Child Development Center, University of North Carolina at Chapel Hill. Fennema, E., Carpenter, T. P., Franke, M. L., Levi, L., Jacobs, V. R., & Empson, S. B. (1996). A longitudinal study of learning to use children’s thinking in mathematics instruction. Journal for Research in Mathematics Education, 27, 403– 434. Frede, E. C. (1995). The role of program quality in producing early childhood program benefits. In R. E. Behrman (Ed.), The future of children: long-term outcomes of early childhood programs (Vol. 5) (pp. 115–132). Los Altos, CA: The Center for the Future of Children. Greenwood, C. R., Carta, J. J., & Dawson, H. (2000). Ecobehavioral assessment systems software (EBASS): a system for observation in education settings. In T. Thompson, D. Felce, & F. J. Symons (Eds.) Behavior observation: technology and applications in developmental disabilities (pp. 220 –252). Baltimore, MD: Paul H. Brookes. Guskey, T. R. (1986). Staff development and the process of teacher change. Educational Researcher, 15, 5–12. Harms, T., & Clifford, R. (1980). Early childhood environment rating scale. New York: Teachers College Press. Harms, T., Clifford, R., & Cryer, D. (1998). Early childhood environment rating scale-revised edition. New York: Teachers College Press. Hart, C. H., Burts, D. C., & Charlesworth, R. (Eds.) (1997). Integrated curriculum and developmentally appropriate practice: birth to age 8. New York: SUNY Press.

452

K.L. Maxwell et al. / Early Childhood Research Quarterly 16 (2001) 431– 452

Hemmeter, M. L., Maxwell, K. L., Ault, M. J., & Schuster, J. W. (2001). Assessment of practices in early elementary classrooms. New York: Teachers College Press. Homes, J., & Morrison, N. (1994). Determining continuity in the primary grades with regard to developmentally appropriate teaching practices. (ERIC Document Reproduction Service No. ED 382 369) Huffman, L. R., & Speer, P. W. (2000). Academic performance among at-risk children: The role of developmentally appropriate practices. Early Childhood Research Quarterly, 15, 167–184. Hyson, M. C., Hirsh-Pasek, K., & Rescorla, L. (1990). The classroom practices inventory: an observation instrument based on NAEYC’s guidelines for developmentally appropriate practices for 4 and 5 year old children. Early Childhood Research Quarterly, 5, 475– 494. Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159 –174. Oakes, P. B., & Caruso, D. A. (1990). Kindergarten teachers’ use of developmentally appropriate practices and attitudes about authority. Early Education and Development, 1, 445– 457. Odom, S. L., & McLean, M. E. (1996). Early intervention/early childhood special education: Recommended practices. Austin, TX: Pro-Ed. Peisner-Feinberg, E. S., & Burchinal, M. R. (1997). Relations between preschool children’s child-care experiences and concurrent development: the cost, quality, and outcomes study. Merrill-Palmer Quarterly, 43, 451– 477. Phillips, D. A., & Howes, C. (1987). Indicators of quality in child care: review of research. In D. A. Phillips (Ed.), Quality in child care: what does research tell us? (pp. 1–20). Washington, DC: National Association for the Education of Young Children. Salisbury, C. L., Mangino, M., Petrigala, M., Rainforth, B., Syrca, S., & Palombaro, M. M. (1994). Promoting the instructional inclusion of young children with disabilities in the primary grades. Journal of Early Intervention, 3, 311–322. Sherman, C. W., & Mueller, D. P. (1996, June). Developmentally appropriate practice and student achievement in inner-city elementary schools. Washington, DC: Head Start’s Third National Research Conference. Stipek, D. J., & Byler, P. (1997). Early childhood education teachers: do they practice what they preach? Early Childhood Research Quarterly, 12, 305–325. Stipek, D., Daniels, D., Galluzzo, D., & Milburn, S. (1992). Characterizing early childhood education programs for poor and middle-class children. Early Childhood Research Quarterly, 7, 1–19. Symons, F., Clark, R. D., Roberts, J. P., & Bailey, D. B. (2001). Classroom behavior of elementary school-aged boys with Fragile X Syndrome. Journal of Special Education, 34, 194 –202. Vartulli, S. (1999). How early childhood teacher beliefs vary across grade level. Early Childhood Research Quarterly, 14, 489 –514. Whitebook, M., Howes, C., & Phillips, D. (1989). Who cares? Child care teachers and the quality of care in America. (Final report of the National Child Care Staffing Study). Oakland, CA: Child Care Employee Project. Wilkinson, L., & the Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: guidelines and explanations. American Psychologist, 54, 594 – 604.