Early Childhood Research Quarterly 19 (2004) 569–587
A new measure for assessing developmentally appropriate practices in early elementary school, A Developmentally Appropriate Practice Template M. Lee Van Horna,∗ , Sharon L. Rameyb a
Department of Psychology, Barnwell College, University of South Carolina, Columbia, SC 29208, USA b Center for Health and Education, Georgetown University, Washington, DC, USA
Abstract This study examines the psychometric properties of an observer measure of developmentally appropriate practices (DAP) for early elementary school classrooms, known as A Developmentally Appropriate Practices Template (ADAPT). Using a sample of 854–1511 first through third grade classrooms in 207–295 schools across 3 years, the study evaluates the reliability and content validity of the ADAPT as a measure of DAP; examines the ADAPT ratings descriptively, including the extent to which classrooms within the same schools are similar; evaluates the factor structure of the ADAPT; tests the invariance of that structure across treatment conditions; and assesses the convergent validity of the measure. The ADAPT is found to have a reliable, stable, and consistent factor structure consisting of three distinct but correlated factors: integrated curriculum, a social emotional emphasis, and childcentered approaches. The ADAPT was also found to be moderately related to another observational measure of the broader classroom environment. There was a large degree of similarity in DAP among classrooms within schools, affirming the need for using a multilevel framework to assess these practices. © 2004 Elsevier Inc. All rights reserved.
Since the National Association for the Education of Young Children (NAEYC) first published the Developmentally Appropriate Practices (DAP) Guidelines in 1987 (Bredekamp, 1987), later revised by Bredekamp and Copple (1997), the day-to-day experiences of children have been affected nationwide. Over a million copies of the guidelines have been sold and many teachers and early childhood professionals have, to some extent, implemented DAP in their classroom. While there is not unanimous agreement among professionals about the merits of DAP, most of the criticisms have been confined to theoretical ∗
Corresponding author. E-mail address:
[email protected] (M. Lee Van Horn).
0885-2006/$ – see front matter © 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.ecresq.2004.10.002
570
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
problems with the guidelines and there have been a limited number of empirical studies to assess the impact of DAP on children (Jones & Gullo, 1999). One of the barriers to such attempts is a lack of psychometrically validated measures of DAP. The purpose of this study is to evaluate the psychometric properties of a measure of DAP in elementary schools and better understand DAP in a real world setting. We discuss statistical and methodological issues involved in measuring DAP, review other measures of DAP, and then present psychometric analyses of a previously developed measure using a large nationwide data set.
1. Methodological and statistical issues with the analysis of nested data Analysis of nested designs requires analytical techniques that consider the non-independence of observations (Bryk & Raudenbush, 1992; Heck & Thomas, 2000; Hox, 1995, 1998), such as characteristics of classrooms within the same school being more similar than classrooms in different schools. As Sirotnik (1980) noted, rarely do investigators apply nested data analysis techniques when evaluating the psychometric validity of instruments, an observation that remains true 20 years later. The critical issue is that the observed variance of an item or set of items in a nested design is due to more than one source. A rating of DAP, for example, may include variance due to both the classroom setting and the larger school context. Assessing reliability adequately would involve estimating the variance in the construct of interest at multiple levels (Muth´en, 1994). To ignore the multilevel nature of a construct such as DAP, would yield results representing an unspecified combination of the between and within group comparisons (Sirotnik, 1980). Nested data require multilevel analyses only if there is an effect of the nesting. For example, if there is no true relationship between the school and classroom level DAP (i.e., if classrooms within schools are actually independent), then a nested design would not be needed (Hox, 1998). Often this issue is addressed by calculating intra-class correlation coefficients (ICCs) which assess the proportion of variance in an item due to school. ICCs help identify the need for multilevel analysis and may also provide information about how DAP is implemented in educational settings. To date no studies directly test the similarity of classrooms within schools, despite the practical and policy implications that would derive from such analyses. Specifically, if levels of DAP in classrooms within a school are similar, then educational continuity may be enhanced for young learners over the early elementary years. High ICCs would also be consistent with the conclusion that there is something about schools that influences classroom level teaching practices, which is important to know when planning for system level changes. Although multilevel psychometric analyses are useful, they are not always easy to apply. Most factor analytic procedures benefit tremendously from relatively large sample sizes in order to yield stable and reliable estimates. In fact, exploratory factor analyses (EFA) are recommended to have a minimum sample size of 150 in the optimal case in which factors are clearly defined; further, confirmatory factor analyses (CFA) typically require 5–20 observations per item (McGrew, Gilman, & Johnson, 1992; Tabachnick & Fidell, 1996). The need for a large sample expands proportionally in the case where multilevel analyses are appropriate: that is, one must have both adequate sample sizes within clusters and an adequate number of clusters (Hox, 1998). To conduct adequate psychometric analyses of a tool intended to measure a key construct such as DAP is an expensive and time-consuming proposition. The present study represents an exceptional opportunity to conduct multilevel psychometric analyses because it was embedded in a large, multi-site, randomized longitudinal trail designed in part to promote DAP in half of the schools and classrooms.
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
571
1.1. Measures of DAP The definition of DAP by proposed in the NAEYC guidelines (Bredekamp & Copple, 1997) is generally considered the standard (Fowell & Lawton, 1993; Walsh, 1991). There is not, however, a clear standard that dictates how to assess whether a classroom achieves the DAP ideals. For scientists and educators alike, a major issue is how to measure DAP. Evaluations of the effects of DAP (especially regarding educational benefits for learners and the classroom management benefits for the teacher) depend upon having a valid and reliable measure of DAP itself. One line of research has examined “teacher beliefs” about DAP and their relationship to student preformance. Several tools measuring teacher beliefs about DAP have been developed (Charlesworth, Hart, Burts, & Hernandez, 1991), and are considered a proxy for measures of classroom practices. Charlesworth et al. and others, however, have found only a modest relationship between teacher beliefs and teachers’ self-reported use of DAP in the classroom. Further, teachers’ reported use of DAP and observational measures of DAP have been found to be related (Burts, Hart, & Kirk, 1990; Charlesworth, Hart, Burts, & Hernzndez, 1990; Charlesworth et al., 1991; Charlesworth, Hart, Burts, & Thomasson, 1993), but these studies include only teachers with the highest and lowest levels of self-reported DAP use. This raises the question of whether the majority of classrooms (i.e., classes not extremely high or low on DAP) could be reliably indexed by teachers’ self-report alone. This paper focuses on observer measures of DAP. Further work is needed to establish whether teachers’ self-reports can be a viable alternative. Another measurement issue is whether DAP ratings or observations should be treated as continuous measures or used to categorize classrooms as developmentally appropriate versus “inappropriate.” Researchers commonly use these categories, despite the lack of empirical basis and despite DAP supporters’ beliefs that the practices operate on a continuum (Bredekamp & Copple, 1997; Buchanan, Burts, Bidner, White, & Charlesworth, 1998; Fowell & Lawton, 1993). Further, the limited research cannot provide an empirical basis, at this time, for establishing a clear threshold for such categorization. Categorization also eliminates much of the variance in DAP, resulting in a less powerful test of its effectiveness. This study treats DAP as a continuous measure. Five observer measures of DAP have been previously reported. The first, the Assessment Profile for Early Childhood Programs (Abbott-Shim & Sibley, 1992), was developed in the 1970s to measure the quality of the preschool environments and was modified more recently for use in early elementary school classrooms (Huffman & Speer, 2000). Because the Assessment Profile was not developed specifically to measure DAP many items are not congruent with the NAEYC’s guidelines. An evaluation of the content of the Assessment Profile suggests that it measures constructs overlapping with, but distinct from DAP and should not be used to make direct inferences about DAP. The Classroom Practices Inventory (CPI:Hirsh-Pasek, Hyson, & Rescorla, 1990; Hyson, Hirsh-Pasek, & Rescorla, 1990) was developed to measure one key aspect of DAP, “informal” instructional style (versus “formal”), in classrooms with 4–5-year-old children. The CPI consists of 26 items; 20 of them are based on the 1987 NAEYC guidelines and the other 6 measure the social and emotional environment. Unfortunately, psychometric analyses of the CPI combined multiple ratings per classroom. This overestimates the strength of the factor structure and internal reliabilities reported since non-independence of observations will increase the observed relationships between items. The Checklist for Rating Developmentally Appropriate Practice in Kindergarten Classrooms is a 24item instrument, based on the 1987 NAEYC guidelines (Charlesworth et al., 1990, 1993). This measure has only been used by the authors to estimate the validity of the teacher self-report scale.
572
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
Stipek and her coworkers (Stipek, Daniels, Galluzzo, & Milburn, 1992; Stipek, Feiler, Daniels, & Milburn, 1995; Stipek et al., 1998) developed a tool to measure another key aspect of DAP: the extent to which a classroom is “didactic” versus “child-centered.” The revised instrument has 31 items and has been tested in preschool and kindergarten classrooms. The measure correlates positively with scores on the Early Childhood Environmental Rating Scale (Harms & Clifford, 1980) and with the CPI. Psychometric analyses were conducted with relatively small samples (N = 60 or less) and ignored the nesting of classrooms within schools. Finally, the Assessment of Practices in Early Elementary Classrooms (APEEC, Hemmeter, Maxwell, Ault, & Schuster, 2001) was designed to assess the use of DAP and individualized instruction in kindergarten through third grades. The APEEC is an observer measure in which 16 items that map onto the DAP criteria are rated according to detailed descriptors. The psychometric report on the APEEC used ratings of 69 classrooms and found the measure to have low to moderate reliability (kappa ranged from .39 to .78), and to be moderately correlated with other measures of classroom quality (e.g., the Assessment Profile) and teacher ratings of DAP (Maxwell, McWilliam, Hemmeter, Ault, & Schuster, 2001). Although in some schools more than one teacher was recruited, analyses did not model the nested data structure. The factor structure of the APEEC has not been assessed. Only two of these measures were developed to be used with an early elementary school population. There are potential differences between implementing DAP in grade school as opposed to preschool (Buchanan et al., 1998; Maxwell et al., 2001). The main differences concern with changing developmental stages and different demands placed by preschool and early elementary school. For example, as curriculum and learning specific content become more important and children are more able to sit through didactic and worksheet type activities these activities may be used much more often. The two measures of DAP that were formulated for grade school (the revised Assessment Profile and the APEEC) and the new measure assessed in this study reflect such differences in normative activities. One of the contributions of this measure is that it is one of only two measures designed specifically to assess DAP through third grade. Although the DAP guidelines are recommended for children through third grade, there is very little research currently assessing DAP effectiveness for these children. The present measure should facilitate investigations of DAP effectiveness for children in the third grade. An aspect of DAP that has not been examined in depth with any of these measures is the extent to which there are different components of DAP. In the NAEYC guidelines DAP are referred to as a set of practices which promote positive development. There has been no attempt to assess whether different practices can be grouped together. One of the goals of this paper is to examine whether there are different types of practices underlying the items measuring DAP. 1.2. Research goals The current study extends earlier research related to DAP assessment by reporting on a new, observational rating tool for elementary school classrooms, A Developmentally Appropriate Practices Template (ADAPT), which was administered as part of a large national study. The first goal of the study is to assess the content validity of the ADAPT as a measure of the 1997 DAP guidelines. The ADAPT was developed using the 1987 NAEYC DAP guidelines, but to be a useful measure of DAP it needs to assess the revised guidelines adequately as well. The second goal of this study is to evaluate the basic measurement properties of the ADAPT, including the distributions of each ADAPT item and the Intraclass Correlation Coefficients (ICCs). This provides data on the clustering of ADAPT scores within schools
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
573
and the distribution of DAP in this population of classrooms. The third goal is to evaluate the construct validity of the measure by assessing the correlations of DAP scores for the same classrooms over time. Assuming that teachers classroom practices are consistent over time, an assessment of consistency in DAP over time provides further evidence of the validity of the measure. The final goal of this study is to evaluate the convergent validity of the ADAPT by assessing its relationship with the Assessment Profile.
2. Methods The data were collected as part of the National Head Start Public School Early Childhood Transition Demonstration Project, a 6-year longitudinal study following the transition to grade school of former Head Start and non-Head Start children through third grade. In 1992 and 1993 two cohorts of children began participating in the project and were each followed for 4 school years. The Transition Demonstration Project was a 31-site randomized trial funded by the U.S. Congress to improve former Head Start children’s academic, social, and health outcomes by providing comprehensive Head Start-like services through the third grade (for a full description of the study, see Ramey, Ramey, & Phillips, 1996; Ramey et al., 2001). As part of this study teachers in demonstration schools received training on DAP. For the purposes of this study differences between demonstration and comparison teachers were not of interest. Analyses reported later examined whether the covariance matrix, and therefore factor structure of the ADAPT items, differed between groups to establish whether the results could generalize across the entire sample. 2.1. Subjects The subjects in the current study are classrooms nested within schools. Schools from each site agreed to participate in the study before being randomized into demonstration and comparison groups. All classrooms containing former Head Start students were assessed and included in the analyses. The schools cover every major area of the country, including rural and urban areas, and have ethnically diverse compositions. In the third year of this study, the local site Principal Investigators became concerned about the need to better evaluate DAP in the classroom, particularly in grades 2 and 3. This led to the development of a new tool with committee input and Margo Gottlieb assuming the lead authorship.1 Thus – following deletion of any class with missing ADAPT data, which resulted in a loss of about 3% of the classrooms – data on the ADAPT are available for 3 years as follows: 1048 classrooms within 207 schools, 1511 classrooms in 354 schools, and 854 classrooms in 295 schools. 2.2. Measure 2.2.1. Assessment Profile The Assessment Profile for Early Childhood Programs (Abbott-Shim & Sibley, 1992) was administered by trained observers. The Assessment Profile has a long history of use in preschools, but was modified for 1
The measure was developed by Margo Gottlieb and Sue Rasher, with feedback from the investigators from the other study sites. Although the authors of this study participated in the National Transition study, we did not play a significant role in the development of the ADAPT and have no financial interest in seeing it used.
574
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
use with first through third grade classrooms for the purpose of this study. The psychometric properties of the Assessment Profile in this data set and for this age range were established by the original authors of the instrument (Abbott-Shim, Sibley, & Neel, 1998). Item Response Theory was used to scale the instrument and create 5 subscale scores – Interacting, Learning environment, Scheduling, Individualizing, and Curriculum – from a total of 60 items. Each spring, trained observers spent a total of one to two hours (in 15-min intervals) observing each classroom and scoring both the Assessment Profile and the ADAPT. 2.2.2. ADAPT The ADAPT was administered only in the last 3 years of the study, corresponding to the second and third grades for cohort I, and the first, second and third grades for cohort II. Therefore, the current study includes first grade classrooms from cohort II only and second and third grade classrooms from both cohorts. Each site was able to chose whether to administer the ADAPT. In all, 25 of the 31 sites used the ADAPT, although not all did so in each year. The ADAPT is comprised of 19 items that were derived from an item analysis of the 1987 NAEYC guidelines, Table 2 includes labels for each item and descriptions of the criteria that anchor the itemspecific scales (ranging from most developmentally appropriate to least developmentally appropriate). Although the revised NAEYC guidelines had not been established at the time the ADAPT was developed, concerns with the original 1987 guidelines were known and taken into account. For instance, the ADAPT included a “multiculturalism” item, which was later reflected in the revised NAEYC guidelines. The items were initially grouped conceptually into three scales of 6 items each: Curriculum and instruction, Classroom management, and Interaction, plus one final separate global rating of DAP. The ADAPT is similar in its rating system to the Early Childhood Rating System (Harms & Clifford, 1980) and the APEEC in that items are rated on a 5-point scale with anchors provided for each point appropriate to the particular DAP guideline. Supplemental descriptions and criteria for each item are provided to raters to help clarify ambiguous scores and improve reliability. Experts familiar with the NAEYC guidelines reviewed each preliminary item for content validity and modifications were made as needed (Gottlieb, 1997). In the initial development of the ADAPT inter-rater agreement was reported to average .69 across items using raters with no previous training in the measure or experience with DAP (Gottlieb, 1997). Before the instrument was used in the national evaluation a separate test of inter-rater reliability was conducted in which two graduate students who were familiar with the instrument and construct being measured assessed the same 30 classrooms. Item-specific kappas ranged from .60 to .80, which is generally in the acceptable range (Suen & Ary, 1989). For the current study, the authors of the ADAPT trained the people overseeing evaluations at each site, who in turn trained the raters to a level of 80% agreement or better. Although achievement of this standard could not be verified empirically, the high levels of consistency for teachers rated 2 years in a row (reported below) are suggestive that the reliability of the measure was quite high. 2.3. Data analyses Multilevel analyses of covariance matrixes were used to establish the factor structure of the ADAPT and estimate internal reliability. First we examined the need for multilevel analyses using intra-class correlation coefficients (ICCs) for each item (Heck & Thomas, 2000; Hox, 1998). Then, multilevel confirmatory factor analyses were used to test the measurement model in which teachers were nested
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
575
within schools. Multilevel covariance structural analyses (MCA), were originally proposed by Muth´en in 1989 and later detailed in 1994. These analyses followed the steps described in Muth´en’s 1994 paper, using Mplus 2.01 (Muth´en & Muth´en, 2001). Additional psychometric statistics were also computed, including alpha coefficients of internal reliability for the between and within components of each subscale. The alphas were computed separately for the within and between school correlation matrices in the same way as conventional alpha coefficients are computed except the correlations used in the computations were weighted for group size. Because these data were collected as part of an intervention designed to affect classroom practices, analyses were conducted to demonstrate that the factor structure was not affected by the intervention. Procedures recommended in an extensive review of the measurement invariance literature by Vandenberg and Lance (2000) were adapted for use with these multilevel data. Because classrooms were randomized at the school level, multigroup analyses examining differences between classrooms within treatment and control schools are appropriate. Multilevel multigroup MCA were conducted to determine whether the covariance matrices differed between treatment and control schools. Finally, concurrent validity in this study was examined using multivariate HLM models that examined the relationship of the ADAPT with the Assessment Profile. Although the Assessment Profile should not be viewed as a comprehensive measure of DAP, it is reasonable to expect that it would relate moderately to ADAPT, given that both tools seek to rate structural and instructional aspects of classrooms. For these analyses, the three ADAPT subscale scores were regressed on classroom level Assessment Profile ratings at the classroom level and classrooms were treated as nested within schools.
3. Results 3.1. Content validation If the ADAPT is to be used as a measure of the NAEYC DAP guidelines it is important that it first be demonstrated that the items correspond to the central components of the guidelines. This was done using qualitative techniques utilizing the principals of grounded theory to identify the areas of DAP (Glaser & Strauss, 1967). The process was conducted separately for each of the six guidelines and their subcomponents in the 1997 manual (Bredekamp & Copple, 1997). The qualitative analyses involved identifying the theme of each component using the description provided in the manual, the detailed description of DAP provided for ages 6–8, and the examples of appropriate and inappropriate practices for this age group. The six components of DAP are: creating a caring community of learners; teaching to enhance development; constructing appropriate curriculum; assessing children’s learning and development; establishing reciprocal relationships with parents; and policies supporting learning (Bredekamp & Copple, 1997, pp. 16–22). Once this was done each ADAPT item was separately evaluated for its relevance to each of the ADAPT guidelines and their components. This coding was conducted by the first author and a graduate student trained in qualitative analyses. This technique is similar to qualitative analysis in which themes are identified in the data, a coding system is devised, and then each response is coded according the occurrence of those themes (Patton, 1987). In this case the themes and definitions of the themes were identified using the NAEYC guidelines and the data being coded were the ADAPT items. To provide further evidence of content validity these results were reviewed by two senior researchers familiar with the DAP guidelines. The review resulted
576
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
Table 1 Relationship of ADAPT items to NAEYC DAP guidelines DAP construct
ADAPT items
I. Creating a caring community of learners Promoting a positive climate for learning Building a democratic community
1B, 1E, 1F, 2E, 2F, 3A, 3B 1F, 2D, 2E, 2F, 3C, 3D, 3E
II. Teaching to enhance development and learning Environment and schedule Teaching strategies Motivation and guidance
2B, 2C, 2E, 3A, 3B, 3D, 3F 1A, 1B, 1C, 1E 2A, 2B, 2D, 2E, 2F, 3E
III. Constructing appropriate curriculum Integrated curriculum The continuum of development and learning Coherent, effective curriculum Curriculum content
1A, 1C, 1D 1A, 1B, 1D, 1C 1B, 1C, 1F 1A, 1C, 1D, 3A, 3B
IV. Assessing children’s learning and development V. Establishing reciprocal relationships with parents VI. Policies supporting learning Note. The labels and anchors for the item numbers reported in this table can be found in Tables 2 and 3.
in some modifications of the qualitative coding. The final coding of each item from the ADAPT to the components of DAP is reported in Table 1. An example of this process is the coding of the ADAPT item Organization, which was found to measure two components of DAP, the “classroom promotes a positive climate for learning” and “effective teaching strategies” (Bredekamp & Copple, 1997). The DAP guidelines state that appropriate practice would be indicated by “responsiveness to individual children . . .in the classroom environment, curriculum, and teaching practices” (Bredekamp & Copple, 1997, p. 162). This ADAPT item assesses the degree to which the “teacher adapts instruction to children’s interest, needs, and prior knowledge.” The guidelines also state that teaching strategies should “adapt instruction for individual children who are having difficulty as well as for those who are . . .more advanced” (Bredekamp & Copple, 1997, p. 165). The link is clear at the inappropriate end of the scale as well, the guidelines state that it is inappropriate when “Teachers use the same lesson and methods for all children without regard to differences in children’s prior experience” (Bredekamp & Copple, 1997, p. 162). The inappropriate side of the ADAPT Organization item states that the “teacher follows lesson plan or curriculum guides; involvement of children limited to prescribed questions.” One result of the qualitative coding of the DAP guidelines is the illustration of overlap between the different constructs outlined in the guidelines. Because of this overlap an ADAPT item may measure more than one area of DAP. This analysis found that three of the six components of DAP identified in the 1997 guidelines were found to be measured by the ADAPT—namely, (1) creating a caring community of learners, (2) teaching to enhance development, and (3) constructing appropriate curriculum. The three DAP areas not assessed by ADAPT were: (1) assessing children’s learning and development, (2) establishing reciprocal relationships with parents, and (3) policies supporting learning. This is not surprising as these areas are not suited for a 1-day observational protocol and perhaps should be measured via a combination of administrative record
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
577
review and interviews. The ADAPT concentrates on measuring those aspects of DAP that are part of the everyday classroom activities. 3.2. Basic measurement properties Because the ordinal scales required using polychoric correlations, which are not yet available in an MCA context, it was necessary to establish the normality of the ADAPT item distributions. Analyses of the distributions showed that they were approximately normal for all but one item (see Table 2 for distributions of items in year 1). The exception was the Multiculturalism item for which skewness ranged from 1.33 to 1.77 and kurtosis from 1.87 to 2.71, across all years. This item had a large floor effect, with only very few classrooms scoring 3 or above. Listwise deletion was used for these analyes because it has been found to be unbiased with a loss of 5% or fewer cases (Graham, Elek-Fisk, & Cumsille, in press). The examination of school clustering for each item across each year showed a very strong relationship between school membership and ADAPT ratings, with schools accounting for 12 to 44% of the variance in DAP (see Table 3). The finding that classrooms within a school tend to be much more similar to one another than they are to classrooms in other schools illustrates the need for multilevel analyses with these data. 3.3. Construct validity The next step involved evaluating the factor structure of the ADAPT. This analysis was designed to evaluate whether DAP is primarily a single set of closely related teaching practices covering different overlapping areas or whether its components are, in practice, more differentiated. Because different domains or factors of DAP have not been established, exploratory factor analyses were conducted using the pooled-within school covariance matrix in the year 1 data (this represents the differences of classrooms within schools). At the within school level, a three-factor solution provided the best fit to the data: “integrated curriculum,” “social/emotional emphasis,” and “child-centered approaches.” We note that while these descriptors are theoretically appealing, the items were not written with these subscales in mind and thus the fit of the items with these descriptors is not perfect. Two items did not fit well by standard criteria (i.e., factor loadings of .40 or more). One of these items, multiculturalism, was retained on the integrated curriculum subscale because of its theoretical importance and because the high skewness and restricted range of the item would be expected to attenuate correlations with other items on the scale. The second item that failed to fit clearly in the factor structure was “classroom learning style.” The item appeared to be related to two of the three constructs identified in the exploratory analyses (loadings were right at .40 for each factor). In order to maintain the integrity of the subscales, we dropped this item from further analyses, although it may merit continued inclusion in the tool and use as a stand-alone item. Exploratory factor analyses in year 1 were succeeded by confirmatory factor analyses for all 3 years separately. MCA were used to test both the within and between school factor structures. Evaluation of fit statistics, parameter estimates, and residual matrices found that within schools a three-factor solution continued to fit best, but between schools one factor generally fit just as well as the more complex three-factor solution (see Table 4 for a comparison of fit statistics). In the three-factor between-school solution, the correlations between the factors were all over .95, suggesting that the factors were measuring
578
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
Table 2 Distribution of items on A Developmentally Appropriate Template in year 1 Item labels
DIP anchor (1)
DAP anchor (5)
M
1A. Comprehensive purpose
No discernible purpose in instruction; instructional outcomes/objective are not clear Teacher follows lesson plan or curriculum guides; involvement of children limited to prescribed questions Narrow in scope with emphasis on factual knowledge and accumulation of discrete skills Emphasis on basal with phonics and decoding skills, spelling, and mechanics in isolation Use of learning strategies that rely on children’s recall, recognition, and recitation No evidence of linguistic and cultural diversity in classroom or in instructional practices Children work independently; interaction not promoted Teacher directs all learning and maintains locus of control; formal relationship with children Children do not have access to classroom materials outside of textbooks/workbooks Children’s weaknesses accentuated by teacher or their own identity is questioned Teacher-initiated and controlled with children not actively engaged No harmony, competitive, harsh tone between children and teacher
Purpose covers social, linguistic, cognitive, and academic development of children
2.84
.87 −.15
.44
Teacher adapts instruction to children’s interests, needs, and prior knowledge
2.53
.95
.13
.73
Concepts, principals, issues reflected in themes: personally and socially relevant action to children’s lives in and out of classroom Literacy integrated across content areas with literacy materials of social relevance
2.50
.89
.14
.61
2.52
.95 −.36
.29
2.12
.97 −.16
.67
1.65
.88
1.58
2.67
.98 −.55
.15
2.74
1.02 −.65
.10
1B. Organization flexibility
1C. Concept-based content/focus
1D. Curricular-driven literacy development 1E. Multiple and diverse strategies
1F. Multiculturalism
2A. Children form a community of learners 2B. Learning is teacher facilitated
2C. Children actively interact with materials 2D. Children nurtured socially and emotionally 2E. Classroom learning style is child centered 2F. Classroom climate is harmonious
Use of learning strategies that tap children’s linguistic, spatial, logical-mathematical, and musical potential Multicultural issues integrated into curriculum, instruction, and school leading to social action Cooperative, encouraging, complimentary, supporting each other at all times Teacher facilities learning, supports children’s decisions, and advocates for each child on his/her behalf Children encouraged to choose and interact with materials to create and problem-solve
S.D.
Kurtosis
2.52
Skewness
2.52
.95
.02
.51
Children’s social and emotional development consistently supported by peers and teacher
2.91
.79
.78
.36
Activities initiated or supported by children in collaboration with teacher Children and teacher collaborate; classroom exemplifies community of learners with shared goals
2.64
.79
.33
.27
2.90
.82
.12
.13
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
579
Table 2 (Continued ) Item labels
DIP anchor (1)
DAP anchor (5)
M
S.D.
3A. Physical space and arrangement is inviting
Learning centers inviting to children, with variety of real objects; displays of children-generated work Children-made books, original projects and products; computer software aligned with content Children have choices in grouping patterns by task
2.44
1.09 −.46
.52
2.47
1.00 −.49
.49
2.25
1.17 −.72
.53
3D. Learning is cooperative
Physical arrangement limited to rows of desks, textbooks, commercial displays and workbooks Basal readers, worksheets, publisher materials, commercial posters Large group activities predominate, irrespective of task No evidence of children working together
3E. Children are mostly self-regulated
Children lack self-control; unaware of or do not pay heed to rules
3F. Time is flexible
Specific time slots for each curricular area; reading, writing, math, science, social science treated independently
3B. Child-generated resources 3C. Child grouping is flexible
Children work interdependently to complete task or project and make joint decisions Children engage in peer negotiation or conversation to resolve issues or self-regulation is strongly evident Time is flexible, based on input from children and work required for projects and constraints of day
Kurtosis
Skewness
2.26
.97
.00
.58
2.57
.72
1.40
.67
2.35
1.01
.43
.90
Notes. DAP: Developmentally Appropriate Practice; DIP: Developmentally Inappropriate Practice; ADAPT item anchors are under copy write, and reprinted with permission. For further information about this measure contact Margo Gottlieb,
[email protected].
essentially the same construct. Further, the three-factor within and one-factor between model (see Fig. 1) provided an excellent fit of the data (although the preformance of fit criteria have not been evaluated specifically in the multilevel covariance analyses framework, Hu and Benter (1999) suggest .06 or less for RMSEA combined with .08 or less for SRMR as cutoffs providing good type I error protection). Chisquare difference tests are not available when using the WLSMV chi-square statistic because this statistic adjusts the degrees of freedom to obtain correct p-values (Muth´en, du Toit, & Spisic, 1997; Muth´en & Muth´en, 2001). Various other modifications to this structure were tried, including using the scale developer’s originally suggested ADAPT factors, but none improved upon these models. The three within school factors correlated highly with one another, but are distinct enough to be considered separately. The parameter estimates for all years are reported in Table 5. Following the establishment of the measurement model of the ADAPT, subscale scores were created by averaging all the items on that scale. These scores were examined for each year (see Table 6). The average for all the scales was about 2.5, just under the center of the range of possible values; the standard deviations were about .75; and none of the scales exhibited meaningful skewness or kurtosis in any of the 3 years. Reliability was calculated applying Cronbach’s Alpha procedure to the within and between covariance matrices for each year. Alphas for the three subscale scores between classrooms were acceptable, ranging from .87 to .91 across all years. For the global measure of DAP between schools, the alpha was .98 for each of the 3 years. Further analyses examined the ICCs for each of the latent variables (see the 6th row
580
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
Table 3 Intra-class correlations of ADAPT items for each year ADAPT items
Year 1
Year 2
Year 3
1A. Comprehensive purpose 1B. Organization flexibility 1C. Concept-based content/focus 1D. Curricular-driven literacy development 1E. Multiple and diverse strategies—cognitive involvement 1F. Multiculturalism 2A. Children form a community of learners 2B. Learning is teacher facilitated 2C. Children actively interact with materials 2D. Children nurtured socially and emotionally 2E. Classroom learning style is child centered 2F. Classroom climate is harmonious 3A. Physical space and arrangement is inviting 3B. Child-generated or real-life materials and resources 3C. Child grouping is flexible 3D. Learning is cooperative and interdependent 3E. Children are mostly self-regulated 3F. Time is flexible allowing for integration of learning
.30 .27 .27 .27 .30 .41 .27 .28 .28 .22 .28 .19 .27 .32 .26 .35 .31 .37
.28 .27 .29 .32 .28 .39 .22 .32 .33 .20 .26 .18 .35 .32 .29 .24 .18 .34
.26 .23 .30 .36 .39 .41 .30 .27 .34 .20 .27 .12 .37 .44 .33 .30 .23 .39
of Table 6) using the variance components from the models including 3 factors within schools and 3 factors between schools (Heck, 2001; Muth´en, 1991). The ICCs for the latent variables were quite similar to those for the observed variables except that the ICC for Social/Emotional Emphasis was lower in the third year. Table 4 Fit indices for multilevel confirmatory factor analyses Model
χ2
d.f.
RMSEA
TLI
SRMR Within
Between
Year 1 One factor within, 1 factor between schools Three factors within, 1 factor between schools Three factors within, 3 factors between schools
1604 1015 998
270 267 264
.07 .05 .05
.86 .92 .92
.08 .05 .04
.09 .05 .04
Year 2 One factor within, 1 factor between schools Three factors within, 1 factor between schools Three factors within, 3 factors between schools
1551 1007 979
270 267 264
.06 .04 .04
.93 .95 .95
.08 .05 .04
.09 .05 .05
Year 3 One factor within, 1 factor between schools Three factors within, 1 factor between schools Three factors within, 3 factors between schools
958 659 641
270 267 264
.06 .04 .04
.92 .95 .96
.07 .06 .06
.06 .05 .05
Notes. The reported χ2 statistic is Muth´en’s MLM statistic, a maximum likelihood mean adjusted statistic with robust standard errors. RMSEA is the Root Mean Square Error of Approximation, TLI is the Tucker Lewis Fit Index, and SRMR is the Standardized Root Mean Square Residual reported for the within and between school covariance matrices.
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
581
Fig. 1. Within and between school ADAPT factors.
Finally, for those teachers whose classrooms were rated in more than 1 year, within and between school correlations were calculated using the pooled within and between school correlations. The within school correlations across years indicate moderate correlations ranging from .30 to .54 across years (see bottom of Table 6). As would be expected, the lower correlations were found in the small group of classrooms for which ratings were available for the first and third years. Weighted correlations of the global DAP measure between schools were also calculated and ranged from .30 to .50. Overall, the ADAPT subscale scores between and within schools are normally distributed, highly reliable, and moderately consistent over time. 3.4. Measurement invariance A concern in the present study is that the data were obtained over the course of a planned intervention to enhance DAP in classrooms. Tests of factor invariance were conducted to evaluate measurement invariance for the intervention and control schools each year in the fashion described above. These analyses revealed
582
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
Table 5 Standardized factor loadings of ADAPT items for each year ADAPT factors/items
Year 1
Year 2
Year 3
w
b
w
b
w
b
Integrated curriculum 1A. Comprehensive purpose 1B. Organization flexibility 1C. Concept-based content/focus 1D. Curricular-driven literacy development 1E. Multiple and diverse strategies—cognitive involvement 1F. Multiculturalism 3F. Time is flexible allowing for integration of learning
.76 .76 .81 .75 .79 .36 .72
.91 .90 .99 .95 .97 .76 .91
.80 .79 .83 .76 .82 .46 .75
.81 .86 .94 .94 .91 .58 .91
.80 .80 .84 .76 .83 .33 .77
.81 .85 .94 .86 .94 .67 .86
Social/emotional emphasis 2B. Learning is teacher facilitated 2D. Children nurtured socially and emotionally 2F. Classroom climate is harmonious 3E. Children are mostly self-regulated
.86 .81 .85 .60
.84 .89 .99 .67
.86 .83 .86 .66
.94 .96 .97 .40
.89 .84 .86 .63
.88 .71 1.00 .49
Child-centered approaches 2A. Children form a community of learners 2C. Children actively interact with materials 3A. Physical space and arrangement is inviting 3B. Child-generated or real-life materials and resources 3C. Child grouping is flexible 3D. Learning is cooperative and interdependent
.74 .79 .76 .72 .68 .72
.88 .93 .89 .87 .92 .89
.81 .80 .74 .75 .72 .76
.93 .86 .87 .90 .90 .96
.83 .78 .68 .69 .75 .77
.91 .83 .99 .96 .82 .86
Note. w is the within school factor loadings and b is the between school factor loadings on a single ADAPT factor. Table 6 Descriptives, reliabilities, ICCs, and cross year correlations of ADAPT subscales Integrated curriculum
Mean Standard deviation Skewness Kurtosis Alpha based on PW ICCs for scales Classroom level correlations year 2 Classroom level correlations year 3
Social/emotional emphasis
Child-centered approaches
Year 1
Year 2
Year 3
Year 1
Year 2
Year 3
Year 1
Year 2
Year 3
2.36 .75 .84 .30 .90 .38 .53 .30
2.43 .73 .54 −.10 .91 .23
2.42 .71 .45 −.54 .89 .19
2.78 .71 .36 .37 .87 .23 .52 .50
2.86 .71 .11 .27 .88 .29
2.88 .71 .05 −.07 .87 .09
2.44 .84 .59 −.17 .87 .29 .54 .36
2.47 .84 .34 −.49 .88 .23
2.43 .84 .44 −.41 .87 .21
.38
.38
.45
Correlations of year 1 with year 2 were based on 323 classrooms, year 1 with year 3 on 79 classrooms, and year 2 with year 3 on 476 classrooms.
that the covariance matrices in the treatment and control group were very similar across years (RMSEAs all were .03 or below; the CFIs and TLIs were all .97 or higher). Since this paper concentrates on the measurement properties of the ADAPT, we do not report here about intervention effects on scale means, but rather that it did not alter the covariance matrices.
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
583
3.5. Convergent validity A final set of analyses examined the relationship of the ADAPT to the Assessment Profile using multivariate multilevel models in which the three ADAPT subscale scores were regressed on the five Assessment Profile scores separately for each year. The simplifying assumption that the σ 2 variance–covariance matrix for the three ADAPT scales consists of one scalar was made (comparison of models without this assumption found no differences in conclusions). Because the interest in these analyses was to examine the relationship of the two measures at the classroom level, the Assessment Profile scores were entered into the model as fixed predictors. The results indicated that Assessment Profile scores are moderately to strongly related to the ADAPT subscale scores (see Table 7), accounting for 67–83% of the classroom level variance and 37–64% of the school level variance in ADAPT scores. The results for the 3 years were essentially the same, with four of the five Assessment Profile scales – Learning Environment, Curriculum, Interacting, and Individualizing – significantly related to ADAPT scales. The one exception was that, in the third year, the Assessment Profile scale “scheduling” also significantly related to the ADAPT. Gamma values, the weighted average slopes across schools, are reported in Table 7. These values appear very small because the Assessment Profile scores ranged from 20 to 70, while ADAPT subscales ranged from 1 to 5. Table 7 Multivariate hierarchical linear models relating the Assessment Profile to ADAPT Model/parameter
γ
S.E.
Year 1: classroom level R2 = .75, school level R2 = .56 Intercept Learning environment Scheduling Curriculum Interacting Individualizing
−1.342** .021** .003 .026** .028** .008**
.153 .003 .002 .003 .002 .002
Year 2: classroom level R2 = .67, school level R2 = .64 Intercept Learning environment Scheduling Curriculum Interacting Individualizing
−1.564** .022** .002 .030** .026** .006**
.128 .002 .002 .002 .002 .002
Year 3: classroom level R2 = .83, school level R2 = .37 Intercept Learning environment Scheduling Curriculum Interacting Individualizing
−1.911** .019** .005* .032** .032** .005*
.170 .003 .002 .002 .002 .002
Note. Multivariate analyses using HLM regresses a linear composite of the multiple outcomes on the independent variables. Thus, even though there are three outcome variables there is just one regression coefficient for each predictor variable. ∗ p < .05. ∗∗ p < .001.
584
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
4. Discussion This study found that the observer measure, A Developmentally Appropriate Practices Template, has strong psychometric properties and is a useful instrument to assess observable developmentally appropriate practices in first through third grades as described in the 1997 NAEYC guidelines. The ADAPT was found to be internally reliable, to have a clearly defined internal factor structure, at both the within and between classroom levels, and to be invariant over time. An added benefit is that ADAPT scores now are available for public use from this recent, large, multi-site study.2 This provides a powerful tool for evaluating the effects of DAP with a rich set of classroom, family, and student variables. It is important to note that this study only validated the ADAPT for first through third grades. It may be that the changes needed to extend this range to younger children may be relatively minor. Buchanan and her colleagues, for example, reported making only minor adjustments to a kindergarten instrument when adapting it for the elementary school grades (Buchanan et al., 1998). This question is left for future research. This study addressed four research goals aimed at evaluating the ADAPT and better understanding the use of DAP in schools. First this study found that the ADAPT measures three of the six components of DAP identified in the revised DAP guidelines. The three areas not measured – assessing children’s learning and development, establishing reciprocal relationships with parents, and policies supporting learning – are components of DAP that are not amenable to measurement by observers and indeed are not included on any other measure of DAP. These analyses highlight the need for future work on how these aspects of classrooms and schools can be measured. The second aim of this study was to describe the results of the ADAPT ratings for the classrooms in this study. An evaluation of the ADAPT scores of these classrooms revealed a large amount of variability in the use of DAP. The averages for the ADAPT items and scales were all somewhat under the midpoint of the range of ADAPT scores, but the standard deviations were quite large. In other words, the vast majority of classrooms fall somewhere in the middle of the scale and there are a few outliers at each end. This suggests that analyses using only high and low DAP classrooms may be missing what happens in the majority of classes. The evaluation of the clustering of ADAPT scores within schools was also informative, suggesting that the use of DAP is to a large degree related to the school in which the classroom is located. Methodologically, the finding underscores the need for multilevel models whenever DAP is studied. Researchers planning studies or conducting analyses on classroom effects should include schools as a level of nesting, failure to address this issue can lead to biased results. Practically, the nesting of classrooms within schools indicates that educational policy and the preparation of both teachers and administrators need to consider the role of the school in enhancing or constraining DAP. Are school-level similarities primarily attributable to how teachers tailor their instruction to fit school policies? Do teachers in the same schools come from a similar teacher preparation background? Does the nature of the student population influence the extent to which DAP is used? It is likely that a combination of these and other influences is responsible for the classroom similarity observed. The third aim was to examine the construct validity of the ADAPT and expand the knowledge of the different aspects that comprise DAP and the extent to which the school contributes to classroom 2
The data set and data dictionary are available free of charge to researchers from the U.S. Department of Health and Human Services, Administration for Children and Families. Information on how to obtain this data set (the Early Childhood Transition Demonstration Study) can be found on their website, www.acf.dhhs.gov, in the Head Start Data Archive.
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
585
level practices. The analyses identified three distinct components of DAP—integrated curriculum, social/emotional emphasis, and child-centered approaches. Although these are highly correlated, they have sufficient statistical independence, they involve theoretically different teacher behaviors, and, potentially, they have different influences on the young learner. This factor structure was shown to be stable across grade levels, a property which is essential for testing differential affects of DAP across grades. It is also informative for those who believe that the meaning of DAP changes across time because it suggests that the measurement model, and therefore the meaning of the construct is the same at least for these three grades. There may be mean differences in the use of DAP over time, but the meaning of the construct appears to be stable. Finally, this study showed that the ADAPT is related to some of the subscales of the Assessment Profile. This finding provides evidence of convergent validity and illustrates that these different measures of classroom environment are interrelated. A focus on broader aspects of classrooms and how they affect children may be beneficial. Quality measurement is an essential foundation for research on organizational processes, and this paper encourages future study of DAP by providing support for a new research tool with high internal consistency and validity. These results are tempered by the inability to demonstrate inter-rater reliability with this data set. While the training protocol and correlations of scores across years for the same teacher are suggestive of adequate inter-rater reliability, this is an area that needs further research. Studies using generalizability theory that could simultaneously assess inter-rater reliability and the effects of the nesting of classrooms within schools would be a welcome addition to this paper. Further, subsequent studies examining the measurement of the untapped dimensions of DAP included in the NAEYC guidelines – namely, assessing children’s learning and development, establishing reciprocal relationships with parents, and [school] policies – would be useful for developing a comprehensive measure of DAP.
Acknowledgments The analyses reported here were based on the National Core Data sets collected for the National Evaluation of the Head Start/Public School Early Childhood Transition Demonstration Project, which was designed and implemented by the National Transition Demonstration Consortium. The Consortium was comprised of Principal Investigators and Project Directors from each of the 31 local sites, Principal Investigators and staff from the Civitan International Research Center at the University of Alabama at Birmingham, staff from the Administration on Children, Youth and Families, and a National Research Advisory Panel/Technical Work Group for the study. We are deeply appreciative of the collaboration, advice, and support received from colleagues throughout the country. Gratitude is extended to all persons who are or have been part of the National Transition Demonstration Consortium, Margo Gottlieb and Sue Rasher deserve special credit, without their contributions the ADAPT would not be a reality. We further acknowledge Stephanie Chopko for her assistance with the qualitative data analyses and Jennifer Beyers for her extensive editorial help. The first author would also like to thank the members of his dissertation committee, Sharon Ramey (Chair), Jerry Aldridge, Craig Ramey, Scot Snyder, and Michael Windle for their support and guidance during the completion of this research. The national evaluation was supported by grants from the Head Start Bureau of the Administration on Children, Youth, and Families to each of the 31 local demonstration sites, as well as by a coordinating contract (#105-91-1935) to the Civitan International Research Center at the University of Alabama at Birmingham.
586
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
References Abbott-Shim, M., & Sibley, A. (1992). Assessment Profile for Early Childhood Programs: Research version. Atlanta, GA: Quality Assist, Inc. Abbott-Shim, M. S., Sibley, A., & Neel, J. (1998). Psychometric report of the Assessment Profile for Early Childhood Programs: Research version of the National Head Start Demonstration Project. Atlanta, GA: Quality Assist, Inc. Bredekamp, S. (1987). Developmentally appropriate practice in early childhood programs serving children from birth through age 8: Expanded edition. Washington, DC: NAEYC. Bredekamp, S. (1993). Myths about developmentally appropriate practice: A response to Fowell and Lawton. Early Childhood Research Quarterly, 8(1), 117–119. Bredekamp, S., & Copple, C. (Eds.). (1997). Developmentally appropriate practice in early childhood programs (Revised ed.). Washington, DC: National Association for the Education of Young Children. Bryk, A. S., & Raudenbush, S. W. (1992). . Hierarchical linear models: Vol. 1. London: Sage Publications. Buchanan, T. K., Burts, D. C., Bidner, J., White, V. F., & Charlesworth, R. (1998). Predictors of the developmentally appropriateness of the beliefs and practices of first, second, and third grade teachers. Early Childhood Research Quarterly, 13(3), 459–483. Burts, D. C., Hart, C. H., Charlesworth, R., DeWolf, D. M., Ray, J., Manuel, K., et al. (1993). Developmental appropriateness of kindergarten programs and academic outcomes in first grade. Journal of Research in Childhood Education, 8(1), 23–31. Burts, D. C., Hart, C. H., Charlesworth, R., Fleege, P. O., Mosley, J., & Thomasson, R. H. (1992). Observed activities and stress behaviors of children in developmentally appropriate and inappropriate kindergarten classrooms. Early Childhood Research Quarterly, 7(2), 297–318. Burts, D. C., Hart, C. H., & Kirk, L. (1990). A comparison of frequencies of stress behaviors observed in kindergarten children in classrooms with developmentally appropriate versus developmentally inappropriate instructional practices. Early Childhood Research Quarterly, 5(3), 407–423. Charlesworth, R., Hart, C. H., Burts, D. C., & Hernzndez, S. (1990, April). Kindergarten teachers’ beliefs and practice. Paper presented at the Annual Meeting of the American Educational Research Association, Boston, MA. Charlesworth, R., Hart, C. H., Burts, D. C., & Hernandez, S. (1991). Kindergarten teachers beliefs and practices. Early Child Development and Care, 70, 17–35. Charlesworth, R., Hart, C. H., Burts, D. C., & Thomasson, R. H. (1993). Measuring the developmental appropriateness of kindergarten teachers’ beliefs and practices. Early Childhood Research Quarterly, 8(3), 255–276. Fowell, N., & Lawton, J. T. (1993). Beyond polar descriptions of developmentally appropriate practice: A reply to Bredekamp. Early Childhood Research Quarterly, 8(1), 121–124. Glaser, B. G. (1992). Erergence vs. forcing: Basics of ground theory analysis. Mill Valley, CA: Sociology Press. Glaser, B. G., & Strauss, A. L. (1967). Discovery of grounded theory: Strategies for qualitative research. Chicago: AVC. Gottlieb, M. (1997). A Developmentally Appropriate Practice Template administration and technical manual. Des Plaines, IL: Illinois Resource Center/OER Associates. Graham, J. W., Elek-Fisk, E., & Cumsille, P. E. (in press). Methods for handling missing data. In W. F. Velicer (Ed.), Comprehensive handbook of psychology (Vol. 2). John Wiley & Sons. Harms, T., & Clifford, R. M. (1980). Early childhood environmental rating scale. New York: Teachers College Press. Heck, R. H. (2001). Multilevel modeling with SEM. In R. E. Schumacker (Ed.), New developments and techniques in structural equation modeling (pp. 89–127). Mahwah, NJ: Lawrence Erlbaum Publishers, Inc. Heck, R. H., & Thomas, S. L. (2000). An introduction to multilevel modeling techniques. Mahwah, NJ: Lawrence Erlbaum Associates. Hemmeter, M. L., Maxwell, K. L., Ault, M. J., & Schuster, J. W. (2001). Assessment of practices in early elementary classrooms. New York: Teachers College Press. Hirsh-Pasek, K., Hyson, M. C., & Rescorla, L. (1990). Academic environments in preschool: Do they pressure or challenge young children? Early Education and Development, 1, 401–423. Hox, J. J. (1995). Applied multilevel analysis. Amsterdam: TT-Publikaties. Hox, J. J. (1998). Multilevel modeling: When and why. In M. Schader (Ed.), Classification, data analysis, and data highways: Vol. 2000. New York: Springer Verlag. Hu, L., & Benter, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55.
M. Lee Van Horn, S.L. Ramey / Early Childhood Research Quarterly 19 (2004) 569–587
587
Huffman, L. R., & Speer, P. W. (2000). Academic performance among at-risk children: The role of Developmentally Appropriate Practices. Early Childhood Research Quarterly, 15(2), 167–184. Hyson, M. C., Hirsh-Pasek, K., & Rescorla, L. (1990). The Classroom Practices Inventory: An observation instrument based on NAEYC’s guidelines for developmentally appropriate practices for 4- and 5-year-old children. Early Childhood Research Quarterly, 5(4), 475–494. Jones, I., & Gullo, D. (1999). Differential social and academic effects of developmentally appropriate practices and beliefs. Journal of Research in Childhood Education, 14, 26–35. Maxwell, K. L., McWilliam, R. A., Hemmeter, M. L., Ault, M. J., & Schuster, J. W. (2001). Predictors of developmentally appropriate classroom practices in kindergarten through third grade. Early Childhood Research Quarterly, 16, 431–452. McGrew, K. S., Gilman, C. J., & Johnson, S. (1992). A review of scales to assess family needs. Journal of Psychoeducational Assessment, 10(1), 4–25. Muth´en, B. O. (1989). Latent variable modeling in heterogeneous populations. Psychometrica, 54, 557–585. Muth´en, B. O. (1991). Multilevel factor analysis of class and student achievement components. Journal of Educational Measurement, 28(4), 338–354. Muth´en, B. O. (1994). Multilevel covariance structure analysis. Sociological Methods and Research, 22, 376–398. Muth´en, B. O., du Toit, S. H. C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished paper. Muth´en, L. K., & Muth´en, B. O. (2001). Mplus (version 2). Los Angelos: Muth´en & Muth´en. Patton, M. Q. (1987). How to use qualitative methods in evaluation. Newbury Park, CA: Sage Publications. Ramey, C. T., Ramey, S. L., & Phillips, M. M. (1996). Head Start children’s entry into public school: An interim report on the National Head Start-Public School Early Childhood Transition Demonstration Study. Washington, DC: Report prepared for the U.S. Department of Health and Human Services, Head Start Bureau. Ramey, S. L., Ramey, C. T., Phillips, M. M., Lanzi, R. G., Brezausek, C., Katholi, C. R., et al. (2001). Head Start children’s entry into public school: A report on the National Head Start/Public School Early Childhood Transition Demonstration Study. Washington, DC: Department of Health and Human Services, Administration on Children, Youth, and Families. Sirotnik, K. A. (1980). Psychometric implications of the unit-of-analysis problem (with examples from the measurement of organizational climate). Journal of Educational Measurement, 17(4), 245–282. Stipek, D., Daniels, D., Galluzzo, D., & Milburn, S. (1992). Characterizing early childhood education programs for poor and middle-class children. Early Childhood Research Quarterly, 7, 1–19. Stipek, D., Feiler, R., Daniels, D., & Milburn, S. (1995). Effects of different instructional approaches on young children’s achievement and motivation. Child Development, 66(1), 209–223. Stipek, D. J., Feiler, R., Patricia, B., Ryan, R., Milburn, S., & Salmon, J. M. (1998). Good beginnings: What difference does the program make in preparing young children for school? Journal of Applied Developmental Psychology, 19(1), 41–66. Suen, H. K., & Ary, D. (1989). Analyzing Quantitative Behavioral Observation Data. Hillsdale, NJ: Lawrence Erlbaum. Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics (3rd ed.). New York: HarperCollins Publishers. Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–69. Walsh, D. J. (1991). Extending the discourse on developmental appropriateness. Early Education and Development, 2(2), 109–119.