Early Childhood Research Quarterly 47 (2019) 99–110
Contents lists available at ScienceDirect
Early Childhood Research Quarterly
Measurement of early literacy skills among monolingual English-speaking and Spanish-speaking language-minority children: A differential item functioning analysis夽 J. Marc Goodrich a,∗ , Christopher J. Lonigan b,∗∗ , Sarah V. Alfonso c a
Department of Special Education and Communication Disorders, University of Nebraska, Lincoln, United States Department of Psychology and Florida Center for Reading Research, Florida State University, United States c Department of Psychology, Florida State University, United States b
a r t i c l e
i n f o
Article history: Received 26 October 2017 Received in revised form 7 September 2018 Accepted 11 October 2018 Keywords: Language-minority Early literacy Differential item functioning Spanish-speaking children Preschool
a b s t r a c t A critical issue in psychological and educational testing is whether assessments provide reliable and valid estimates of ability for different populations of individuals. This issue may be particularly relevant for populations who are not native speakers of the language in which the assessment is written. Therefore, the purpose of this study was to evaluate the utility of a norm-referenced assessment of English early literacy skills for Spanish-speaking language-minority (LM) children. Participants for this study (1221 preschool children, 751 of whom were identified as Spanish-speaking LM children) completed the Phonological Awareness, Print Knowledge, and Definitional Vocabulary subtests of the Test of Preschool Early Literacy (TOPEL). Item response theory analysis was conducted to examine student performance on each subtest, and performance of monolingual English-speaking and Spanish-speaking LM children was compared using differential item functioning (DIF) analysis. Results indicated that there was minimal DIF for the Phonological Awareness and Print Knowledge subtests. Substantially more DIF was evident on the Definitional Vocabulary subtest, although presence of DIF was not consistently in favor of monolingual English-speaking or Spanish-speaking LM children. Moreover, effect size estimates of DIF indicated that, across most test items, the magnitude of DIF was small to moderate. Taken together, these findings indicate that the TOPEL can be used to obtain valid and reliable estimates of Spanish-speaking LM preschoolers’ English early literacy skills. © 2018 Elsevier Inc. All rights reserved.
1. Introduction The number of language-minority (LM) children, children who speak a language other than the language spoken by the majority of a given population, enrolled in U.S. public schools has been
夽 This research and report was supported by a grant from the Eunice Kennedy Schriver National Institute of Child Health and Human Development (HD060292). The views expressed herein are those of the authors and have not been reviewed or approved by the granting agency. As indicated by its source reference in this article, author Christopher J. Lonigan is the lead author for the Test of Preschool Early Literacy and receives occasional royalties. ∗ Corresponding author at: Department of Special Education and Communication Disorders, University of Nebraska-Lincoln, 271 Barkley Memorial Center, P.O. Box 830738, Lincoln, NE 68583-0738, United States. ∗∗ Corresponding author at: Department of Psychology, Florida State University, 1107 W. Call Street, Tallahassee FL 32306-4301, United States. E-mail addresses:
[email protected] (J.M. Goodrich),
[email protected] (C.J. Lonigan). https://doi.org/10.1016/j.ecresq.2018.10.007 0885-2006/© 2018 Elsevier Inc. All rights reserved.
increasing for the past decade (National Center for Educational Statistics [NCES], 2016a). The majority (76.5%) of LM children in the U.S. speak Spanish at home. There is substantial evidence that Spanish-speaking LM children in the U.S. score lower than do their monolingual English-speaking peers on various measures of academic achievement. For example, in both 4th and 8th grade, Spanish-speaking LM children have significantly lower math achievement than monolingual English-speaking children (NCES, 2011), and results of the 2015 National Assessment of Educational Progress (NCES, 2015) indicate that this achievement gap has not narrowed since 2011. Deficits in reading are also prevalent, and the performance gap in reading between Spanish-speaking LM and monolingual English-speaking children increases as children progress through school. For instance, the achievement gap at 4th grade is 37 points and at 8th grade it is 45 points (NCES, 2016b). Furthermore, Spanish-speaking LM children are at higher risk for dropping-out of school compared to their monolingual Englishspeaking peers (Callahan, 2013). One strategy for reducing the
100
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
achievement gap is to identify at-risk children as early as possible and provide remedial programs to improve their performance. Differentiating between children at risk for poor academic performance in the future from those who are not at-risk necessitates reliable assessment instruments that can be easily used to monitor student progress and help researchers and practitioners make informed educational decisions. A number of reliable instruments to track the performance of monolingual English-speaking children are currently in use; however, there is some evidence suggesting that assessment tools that are effective for discriminating between at-risk and not-at-risk monolingual English speakers may be less valid when used with various populations of LM children (e.g., ˜ García, & Cortez, 2005). Given the elevated risk for Bedore, Pena, poor academic performance and school dropout, identifying reliable and valid assessment tools that aid in the early detection of Spanish-speaking LM children at risk for academic failure is critical. 1.1. Assessment of LM children In recent decades, legislators, researchers, and practitioners have emphasized the importance of unbiased assessment when testing LM children. For example, one of the core principles of the Individuals with Disabilities Education Improvement Act of 2004 (U.S. Congress, 2004) specifies that evaluations used to identify eligibility for special education services must be appropriate and unbiased. Additionally, according to the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 2014), characteristics intrinsic to the individual (e.g., language proficiency) but not relevant to the construct of interest should not provide an advantage or disadvantage in assessment. For LM children, assessment procedures that do not involve assessment in children’s home language (if that is children’s most proficient language) may underestimate children’s true abilities and result in over-identification of risk status or presence of learning or intellectual disabilities. In fact, studies of disproportionate representation in special education indicate that LM children identified as English language learners (i.e., having limited English proficiency) have historically been overrepresented in disability categories such as intellectual disability and specific learning disability (Sullivan, 2011). It is possible that disproportionate representation of LM children is partially due to biased assessment practices. For example, when tests of intellectual ability or academic achievement used to identify intellectual and learning disabilities are administered only in English, children with low levels of English proficiency may score substantially below average on the assessments because of limited English language skills rather than a deficit in the construct being assessed (Artiles, Rueda, Salazar, & Higareda, 2005; Sullivan, 2011). Therefore, it can be important to consider language of test administration to ensure unbiased assessment results for LM children. Despite an emphasis on ensuring that assessments used with LM children are unbiased, it is often unclear what makes an assessment unbiased. For example, there is not a clearly established method for determining the optimum language in which to assess a child, as English proficiency can change over time (Abedi, 2004). Some Spanish-speaking LM children may only speak Spanish at home and have limited exposure to English prior to preschool or kindergarten, whereas other children may be exposed to both Spanish and English equally at home. For a young child with limited exposure to English, assessment in the home language may be the most appropriate, unbiased assessment method. In contrast, assessment in English may or may not be appropriate for a child exposed to Spanish and English equally at home. According to Abedi (2011), assessment of relative levels of proficiency in both the first (L1) and second (L2) languages must occur
prior to determining the language of assessment for other skills and abilities. However, other factors warrant consideration when assessing LM children’s knowledge and skills. For example, evidence indicates that children perform better on assessments when the language of assessment aligns with the language of instruction for the skill being assessed (e.g., Abedi, Lord, & Hofstetter, 1998; Kieffer, Lesaux, Rivera, & Francis, 2009). Therefore, if the outcome of interest is reading achievement and LM children have primarily received reading instruction in an English-language environment, English reading tests should be used to measure achievement, even if English is not the child’s L1. Thus, a child who has higher levels of proficiency in Spanish than in English may have critical knowledge and skills that they are better able to express in English because that knowledge was acquired in an English-language educational environment. Consequently, any low performance on Spanish-language assessments may be due to lack of formal instructional support and opportunity to develop skills in L1, rather than a disability (Klingner, Artiles, & Barletta, 2006). Therefore, determining that language of assessment should be Spanish based on examination of levels of proficiency in English and Spanish may still yield biased test results. Additionally, simply translating an existing Englishlanguage assessment to the child’s L1 may not eliminate bias from assessments (Abedi, Hofstetter, & Lord, 2004). For example, factors such as word frequency influence the acquisition of certain words, and L1 and L2 translation equivalents may not have the same frequency of occurrence in their respective languages. This poses problems for the direct translation of vocabulary assessments. Furthermore, some languages have different dialects (e.g., Mexican Spanish vs. Cuban Spanish), and translated words may be more common in one dialect than they are in another, resulting in items that remain biased against specific groups of Spanish-speaking LM children. Given the myriad issues related to assessment of LM children, it is important to consider the purpose of the assessment when determining the language in which to administer the assessment as well as whether the assessment is unbiased. When attempting to identify disability status, it is important to ensure the assessment measures the construct of interest and that poor performance on an assessment is due to a deficit in ability rather than limited language skills. For example, a child with poor L2 reading skills but age-appropriate L1 reading skills may be misidentified as having a specific learning disability if assessment of reading skills is only conducted in L2 (Abedi, 2004; Sullivan, 2011). In contrast, when attempting to measure level of academic skills in English in a relatively low-stakes context (e.g., simply to predict future development of English academic skills), it may be appropriate to assess children in the language of interest.
1.2. Within- and cross-language prediction of academic knowledge and skills An important but still unanswered question in identifying efficacious tools to aid in the early detection of Spanish-speaking LM children at risk for low academic performance centers on determining the optimum predictor of future skills. Considerable evidence supporting the transfer of some reading-related skills from L1 to L2 exists (Lindsey, Manis, & Bailey, 2003; Manis, Lindsey, & Bailey, 2004; Proctor, August, Carlo, & Snow, 2006). However, much of this evidence is correlational and may not truly represent evidence of transfer but some other phenomenon, such as common language-learning environments across L1 and L2. Additionally, meta-analytic evidence indicates that the longitudinal relations of academic skills are stronger within than across languages (see Prevoo, Malda, Mesman, & van Ijzendoorn, 2016). In other words, English measures of literacy skills are better predictors of English-
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
language literacy outcomes than are Spanish measures of literacy skills. In a longitudinal study of 249 Spanish-speaking LM children, Lindsey et al. (2003) reported that phonological awareness, letter word knowledge, print concepts, and sentence memory were significantly related across Spanish and English. However, their data also suggested that other skills such as vocabulary knowledge and reading comprehension were not related across languages. Longitudinal examination of the same sample from kindergarten to second grade supported their initial findings that cross-language transfer is skill-specific and that associations between expressive language and later reading were stronger within than across languages (Manis et al., 2004). Another study reported that growth in Spanish or English receptive-language skills throughout preschool positively predicted English and Spanish reading skills at the end of kindergarten (Hammer, Lawrence, & Miccio, 2007). In contrast, other studies have reported that children’s Spanish language skills are not significant predictors of future English reading outcomes (Gottardo & Mueller, 2009; Proctor, Silverman, Harring, & Montecillo, 2012). Taken together, these findings suggest that although cross-language transfer may occur for some skills, English language assessments are the best predictors of future English reading outcomes. 1.3. Differential item functioning Although consideration of the purpose of assessment may lead to the conclusion that assessment in L1 is unnecessary, it is still important to ensure that any assessment used is not biased against ˜ specific groups of children (Munoz, White, & Horton-Ikard, 2014). Evaluating the presence of differential item functioning (DIF) in the context of item response theory (IRT; Embretson & Reise, 2000) represents one step in determining whether a particular item or assessment is biased for or against a given population. The presence of DIF is a necessary but not sufficient condition for identifying item bias (Penfield & Lam, 2000). To demonstrate bias, an item must exhibit DIF as well as a systematic association with a construct other than the construct it is intended to assess (i.e., the reason for the DIF is not relevant to the purpose of the assessment). For instance, items on a Spanish-language vocabulary assessment may show DIF with Spanish-speaking children from different regions of the U.S. because of variations of dialect. In some dialects, children may receive less exposure to some words; therefore, making it less likely that these children would respond correctly to those words. Despite the presence of DIF, the items would not be biased if dialect variation was one of the purposes of the assessment. In contrast, the items would be considered biased if the purpose of the assessment had nothing to do with dialect variation. In IRT, discrimination and difficulty parameter estimates are generated for each item on an assessment in the context of the underlying ability level (i.e., theta) of the test takers. The presence of DIF indicates that a discrimination parameter, a difficulty parameter, or both have significantly different values for the different groups examined. The discrimination parameter denotes the degree to which an item indexes the underlying construct or skill being assessed. An item with a high discrimination parameter allows good differentiation between individuals with lower and higher levels of the underlying skill, as indexed by theta. Items with higher discrimination parameters provide more precise estimates of theta than do items with lower discrimination parameters. Items that exhibit DIF on the discrimination parameter are differentially related to theta across examined groups (Embretson & Reise, 2000) and, therefore, provide different levels of differentiation between individuals with lower and higher levels of theta across groups. The difficulty parameter denotes the level of the underlying skill being assessed that is needed to get an item correct.
101
The difficulty parameter represents the level of theta necessary to have a 50% probability of answering the item correctly. Items that exhibit DIF on the difficulty parameter are those for which correct responses are associated with different levels of theta across examined groups. That is, the level of underlying skill required to have a 50% probability of answering the item correctly is higher for one group than it is for the other. Items exhibiting DIF on the difficulty parameter alone (i.e., uniform DIF) are potentially biased, because these items require a higher level of underlying ability for a correct response for members of an examined group across the spectrum of ability (i.e., theta) than they do for members of another group. For example, if responding correctly to a vocabulary item required a higher level of underlying vocabulary skill for Spanish-speaking LM children than it did for monolingual English-speaking children but the degree to which the item indexed vocabulary skills was relatively consistent for both groups of children (see Fig. 1a for an example), the item would demonstrate uniform DIF favoring monolingual English-speaking children. In contrast, if responding correctly to the vocabulary item required a higher level of underlying vocabulary skill for Spanishspeaking LM children than it did for monolingual English-speaking children only at some levels of underlying ability, both the difficulty and discrimination parameters would be different across groups (see Fig. 1b for an example), and the item would demonstrate nonuniform DIF because the item would favor one group at one level of underlying vocabulary skill but favor the other group at another level of vocabulary skill (e.g., in Fig. 1b, the reference group has a higher probability of correct responses than does the focal group at the same level of theta at higher levels of theta, but it has a lower probability of correct responses than does the focal group at the same level of theta at lower levels of theta). In instances of nonuniform DIF, DIF on the difficulty parameter is not as problematic as it is for uniform DIF because it does not consistently favor members of one group over the other across the distribution of theta. Several studies have evaluated DIF by LM status for various assessments of academic knowledge and skills (Farrington, Lonigan, Phillips, Farver, & McDowell, 2015; Kim & Jang, 2009; Koo, Becker, & Kim, 2014; Mahoney, 2008; Martiniello, 2009; Rainelli, Bulotsky-Shearer, Fernandez, Greenfield, & López, 2017; Wolf & Leon, 2009). The majority of studies examining DIF by LM status have focused on assessments designed to measure reading and mathematics achievement. Although some evidence suggests that DIF between LM children and monolingual Englishspeaking children on mathematics assessments is due to the language used on the assessments (e.g., Martiniello, 2009), Kieffer et al. (2009) reported that using simplified English as an assessment accommodation for LM students did not result in smaller achievement score gaps for LM and monolingual English-speaking students. Another study reported that there was no DIF by LM status on the fourth grade NAEP mathematics assessment, suggesting that existing mathematics achievement gaps cannot be explained by test bias (Mahoney, 2008). DIF analyses of language and literacy assessments indicate that items assessing vocabulary knowledge frequently demonstrate DIF between monolingual English-speaking and LM students (Kim & Jang, 2009; Koo et al., 2014); however, few studies have evaluated DIF for assessments of English language and literacy skills of LM preschoolers, with two studies reporting relatively little DIF on early language and literacy measures across Spanish-speaking LM children and monolingual English-speaking children (Farrington et al., 2015; Rainelli et al., 2017). 1.4. Current study Given the numerous concerns related to assessment for LM children and the significant within-language relations between earlier
102
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
Fig. 1. Example item characteristic curves demonstrating uniform DIF (Fig. 1a) and non-uniform DIF (Fig. 1b) across two groups of examinees (represented by solid and dashed lines).
and later language and literacy skills, the purpose of this study was to examine DIF across monolingual English-speaking children and Spanish-speaking LM children on a standardized measure of English early literacy skills, the Test of Preschool Early Literacy (TOPEL; Lonigan, Wagner, Torgesen, & Rashotte, 2007). The TOPEL contains three subtests that evaluate children’s oral language, phonological awareness, and print knowledge skills (see Method section for a more detailed description of this measure). For the purposes of this study, all Spanish-speaking children, regardless of level of proficiency in L1 (Spanish) and L2 (English), were classified as Spanish-speaking LM children. Based on the results of prior research, it was expected that, for items assessing vocabulary knowledge, monolingual Englishspeaking children would have a significantly higher probability of responding to items correctly than would Spanish-speaking LM children (i.e., DIF on the difficulty parameter in favor of monolingual English-speaking children), as prior research examining young Spanish-speaking children’s vocabulary knowledge in L1 and L2 indicates that vocabulary is a relatively languagespecific construct (e.g., Gottardo & Mueller, 2009). However, we did not expect DIF on items assessing other constructs such as phonological awareness and print knowledge. Previous studies of early literacy skills among Spanish-speaking LM children indicate that phonological awareness and print knowledge are relatively language-independent skills, whereas vocabulary knowledge is language-specific (Goodrich & Lonigan, 2017). Because phonological awareness is independent of meaning, children with strong phonological awareness skills in L1 should also have strong phonological awareness skills in L2 (e.g., Durgunoglu, Nagy, & Hancin-Bhatt, 1993; Cummins, 2017). Similarly, there is a significant amount of overlap in letter-names and letter-sound correspondence across English and Spanish; thus, phonological awareness and print knowledge items should function similarly for monolingual English-speaking children and Spanish-speaking LM children.
2. Method
children in this sample were identified as Spanish speakers,1 and data on child ethnicity indicated that the majority of all children in the sample were Hispanic/Latino (83.5%). According to parent report, 95.7% of Spanish-speaking students and 64% of monolingual English-speaking students were Hispanic/Latino. Preschool centers for which at least 50% of enrolled students were Spanish speakers were specifically targeted and recruited for this study. Therefore, monolingual English-speaking and Spanish-speaking LM children in this sample came from the same preschool centers. Additionally, preschool classrooms all had at least one Spanish-speaking practitioner, either the lead or assistant teacher. One of the experimental manipulations implemented as part of the larger curriculum evaluation study involved language of instruction. Thus, language of instruction was randomly assigned for treatment classrooms to be either English-only or initially in Spanish and transitioning to English over the course of the preschool year. Data for language instruction for classrooms in the control group were not available at the time of analysis for this study. At the end of the preschool year (the time point used for data analysis for this study), children ranged in age from 44 to 80 months of age (M = 61.27 months, SD = 4.68 months). Approximately half of the monolingual English-speaking (51.9%) and Spanish-speaking LM (49.5%) children in this sample were male. Data on race indicated that the majority of monolingual English-speaking (77.0%) and Spanish-speaking LM (97.3%) children were white, 14.9% of monolingual English-speaking and 1.3% of Spanish-speaking LM children were black/African-American, 5.1% of monolingual English-speaking and .4% of Spanish-speaking LM children were Asian, and the remainder were of other races or information on race was not reported. For the purposes of this research study, children were identified as Spanish-speakers based on a combination of teacher report, parent report, interaction with research assistants, and scores on Spanish early literacy assessments. Children were identified as Spanish-speakers near the beginning of the study. The majority of children in this sample came from low SES backgrounds. Approximately half (53.4%) of mothers reported having a high school education or less, and an additional 21.5% reported having graduated from college. Mean family income was reported to be between $25,000 and $30,000, and 80.5% of parents reported having an annual family income equal to or less than $50,000.
2.1. Participants Data for this study came from 1221 preschool children who were recruited as part of a curriculum evaluation study. Children were recruited from 74 preschool centers in diverse geographic regions of the United States, including Florida, New Mexico, southern California, and Kansas. The curriculum evaluation study was designed to target a large percentage of children who were classified as Spanish-speaking LM children. Consequently, 61.5% of
1 We were unable to identify whether children were monolingual Spanish speakers or bilingual Spanish–English speakers. However, the average standard score for English Definitional Vocabulary for LM children in this sample was not approaching zero; thus, it is unlikely that children in this sample were monolingual Spanish speakers.
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
2.2. Measures Children completed the three subtests of the TOPEL: Phonological Awareness, Print Knowledge, and Definitional Vocabulary. All subtests for the TOPEL are administered in English, and the TOPEL was developed and normed using a sample of 842 children that was representative of the demographic characteristics of the United States. Only responses in English are scored as correct. If children provide a response in a language other than English, they are prompted once to respond in English. If they still do not respond in English, the item is scored incorrect and testing proceeds with the next item. Internal consistency reliability of the TOPEL was high for each of the three subtests (alphas range from .86 to .96 for threeto five-year-old children), and convergent validity with criterion measures is high for each subtest (correlations ranging from .59 to .77). Items on the three TOPEL subtests were tested in the development process and selected from larger pools of items as being those that were the best indicators of the constructs they were intended to assess, as demonstrated by strong item-total correlations. Prior research has demonstrated that the three skills measured by the TOPEL subtests, are distinct, yet related, early literacy abilities that are measurable during the preschool years and are predictive of children’s future reading abilities (Storch & Whitehurst, 2002). The Phonological Awareness subtest contains 27 items that assess children’s ability to detect and manipulate the varying units of sounds within words. Items on this subtest were designed to follow the developmental sequence of phonological awareness (e.g., Anthony & Francis, 2005). Consequently, items span the range of linguistic complexity, requiring manipulation of sounds at the word, syllable, and phoneme levels. Twelve of the items on the Phonological Awareness subtest are elision items, and 15 are blending items. Elision items require children to remove sounds from words to form a new word (e.g., removing /p/ from lamp to create lamb) and blending items required children to combine words or parts of words to form a new word (e.g., combining door and knob to create doorknob). Of the 12 elision items, six were multiple choice and six were free response. Multiple-choice items required children to point to the correct answer out of four pictures (or say the correct response), and free-response items required children to say the correct response in the absence of pictures. Six elision items required children to remove whole words from compound words to create a new word and six elision items required children to remove phonemes from words to form a new word. Of the 15 blending items, six were multiple choice and nine were free response. Six blending items required children to combine two whole words to form a compound word, and nine blending items required children to combine two or three sub-syllabic units of sound (e.g., individual phonemes, biphone segments) to create a word. Internal consistency reliability for the Phonological Awareness subtest was high in this sample of children (␣ = .90). The Print Knowledge subtest contains 36 items that assess children’s knowledge of print concepts, letter and word discrimination, letter-names, and letter-sound correspondence. Four items assessed knowledge of print concepts (e.g., which one shows the name of the book?) and eight items assessed letter and word discrimination (e.g., which is a letter? which can you read?), all of which were multiple choice. Sixteen items assessed knowledge of letter names, six of which were multiple choice (e.g., which one is “M”?) and 10 of which were free response (e.g., what is the name of this letter?). The remaining eight items assessed knowledge of letter-sound correspondence, four of which were multiple choice (e.g., which one makes the /b/ sound?) and four of which were free response (e.g., what sound does this letter make?). Internal consistency reliability for the Print Knowledge subtest was very high in this sample of children (␣ = .96).
103
The Definitional Vocabulary subtest contains 35 items, each of which contains two components, a naming and a definitional component. Therefore, the total possible score on this subtest is 70. The naming component of the item is similar to items on an expressive vocabulary test, and required children to name a picture (e.g., what is this?) or provide one name that applies to several pictures (e.g., what is a name for all of these?). The definitional component of the item required children to describe a feature or function of the object or item pictured (e.g., what is it for? what does it do?). Children were administered the definitional component of the item regardless of whether they answered the naming component of the item correctly. Internal consistency reliability for the Definitional Vocabulary subtest was very high in this sample of children (␣ = .96). 2.3. Procedure Institutional review board approval was obtained for this study. Parents or guardians of children provided written, informed consent prior to children’s participation in the study. Additionally, assent was obtained from children prior to each testing session. As part of the curriculum evaluation study, children were assessed at the beginning and end of the preschool year. Data for this study came from the end of children’s preschool year, because children only completed the Definitional Vocabulary subtest of the TOPEL at the beginning of the preschool year. Trained research assistants administered the TOPEL as well as other measures that were part of the larger battery of measures in the curriculum-evaluation study in a quiet area of the child’s preschool center in sessions that lasted between 20 and 40 min. Answers were only coded as correct if children responded in English. In cases in which children responded in Spanish, they were prompted to respond in English. If subsequent answers were given in Spanish, the item was scored as incorrect and testing proceeded with the next item. For various reasons (e.g., children were absent on dates of testing), some children had missing data for some TOPEL subtests. 3. Results 3.1. Descriptive statistics Average standard scores on the Phonological Awareness, F(1, 1210) = 9.59, p < .01, Print Knowledge, F(1, 1216) = 16.54, p < .001, and Definitional Vocabulary, F(1, 1206) = 187.23, p < .001, subtests of the TOPEL were significantly different for monolingual Englishspeaking and Spanish-speaking LM children. For both monolingual English-speaking and Spanish-speaking LM children, scores on the Phonological Awareness (monolingual English-speaking M = 98.26, SD = 15.90; Spanish-speaking LM M = 95.12, SD = 17.87) and Print Knowledge (monolingual English-speaking M = 106.67, SD = 12.49; Spanish-speaking LM M = 103.43, SD = 14.11) subtests were in the average range. In contrast, scores on the Definitional Vocabulary subtest were in the average range (M = 97.36, SD = 13.29) for monolingual English-speaking children, but scores on the Definitional Vocabulary subtest were in the below average range (M = 83.90, SD = 18.47) for Spanish-speaking LM children. Missing data were minimal. Four children had data that were completely missing for the Print Knowledge subtest, five children had data that were completely missing for the Definitional Vocabulary subtest, and seven children had data that were completely missing for the Phonological Awareness subtest. 3.2. Item response theory analysis For each subtest, fit statistics were evaluated to determine whether a two-parameter logistic (2PL) model for which both diffi-
104
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
Fig. 2. Test characteristic curves for phonological awareness (2a), print knowledge (2b), and expressive (2c) and definitional vocabulary items (2d).
culty and discrimination parameters were estimated for each item provided a better fit to the data than did a one-parameter logistic (1PL; i.e., Rasch) model in which discriminations were restricted to equality across items and only difficulty parameters were estimated for each item. These analyses were conducted in Mplus Version 7.4 (Muthén & Muthén, 2015) using full information maximum likelihood with robust standard errors to account for any missing item-level data, and detailed results are reported in the online supplemental materials (Table S1). For each TOPEL subtest, a 2PL model fit the data significantly better than did a 1PL model, indicating that items on the TOPEL were not equally discriminative (i.e., both item discrimination and difficulty parameters were necessary to describe each subtest). Therefore, the 2PL models were used in all subsequent analyses. Item discrimination and difficulty parameters for all TOPEL subtests for the overall sample are reported in Tables S2–S4 in Supplemental materials. Test characteristic curves for monolingual English-speaking and Spanish-speaking LM children are shown in Fig. 2 for each subtest. Test characteristic curves allow for examination of whether the estimated true test score is the same across groups for children with the same level of ability (theta). As can be seen in Fig. 2, the Phonological Awareness and Print Knowledge subtests of the TOPEL functioned similarly for monolingual English-speaking and Spanish-speaking LM children. For the naming component of items on the Definitional Vocabulary subtest, true test scores were higher for monolingual English-speaking children than they were for Spanish-speaking LM children with the same level of theta, especially at lower levels of theta. For the definitional component of items on the Definitional Vocabulary subtest, the test functioned similarly for monolingual English-speaking and Spanish-speaking LM children. Estimates of the standard error of measurement for Spanishspeaking LM and monolingual English-speaking children at each
level of theta are shown in Fig. 3 for each subtest. For the Phonological Awareness subtest, standard error of measurement for Spanish-speaking LM children was below .55 for a range of theta between −2.85 and 1.65. Standard error of measurement for monolingual English-speaking children was below .55 for a range of theta between −3.00 and 1.66. This range of standard error of measurement indicates that phonological awareness skills can be measured using the TOPEL with reliability greater than .70 (the conventional reliability threshold used in item response theory2 ) for approximately 95% of both monolingual English-speaking and Spanish-speaking LM children. For the Print Knowledge subtest, standard error of measurement for Spanish-speaking LM children was below .55 for a range of theta between −2.71 and 1.26. Standard error of measurement for monolingual English-speaking children was below .55 for a range of theta between −3.00 and 1.07. This range of standard error of measurement indicates that print knowledge skills can be measured using the TOPEL with reliability greater than .70 for approximately 90% of Spanish-speaking LM children and approximately 86% of monolingual English-speaking children. For the naming component of the Definitional Vocabulary subtest, standard error of measurement for Spanish-speaking LM children was below .55 for a range of theta between −2.70 and 1.97. Standard error of measurement for monolingual Englishspeaking children was below .55 for a range of theta between −3.00 and 1.34. This range of standard error of measurement indicates that expressive vocabulary knowledge can be measured using the
2 Use of a reliability threshold of .80 yielded reliable measurement of 88% of LM children and 89% of monolingual English-speaking children for phonological awareness, 82% of LM children and 77% of monolingual English-speaking children for print knowledge, 95% of LM children and 75% of monolingual English-speaking children for expressive vocabulary, and 93% of LM children and 78% of monolingual English-speaking children for definitional vocabulary.
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
105
Fig. 3. Standard error of measurement for phonological awareness (3a), print knowledge (3b), and expressive (3c) and definitional vocabulary items (3d). Horizontal line is equivalent to a test reliability of .70 in classical test theory.
TOPEL with reliability greater than .70 for approximately 98% of Spanish-speaking LM children and approximately 91% of monolingual English-speaking children. For the definitional component of the Definitional Vocabulary subtest, standard error of measurement for Spanish-speaking LM children was below .55 for a range of theta between −2.34 and 1.98. Standard error of measurement for monolingual English-speaking children was below .55 for a range of theta between −3.00 and 1.25. This range of standard error of measurement indicates that definitional vocabulary knowledge can be measured using the TOPEL with reliability greater than .70 for approximately 97% of Spanish-speaking LM children and approximately 90% of monolingual English-speaking children.
3.3. Differential item functioning analysis To obtain estimates of DIF, independent IRT parameters were generated for monolingual English-speaking and Spanish-speaking LM children using IRTLRDIF (Thissen, 2001). This software uses the likelihood ratio test to test the statistical significance of DIF for each parameter in a 2PL model. Maximum likelihood estimation is used to obtain parameter estimates. For the likelihood ratio test, tests of DIF for each individual item are tested against all other items on the test, which represent the “anchor” for each test. Because significance tests of DIF are sensitive to sample size, effect size estimates of DIF were generated using VisualDF (Meade, 2010). Because of the large number of statistical tests conducted, the BenjaminiHochberg correction was used to control Type I error (Benjamini & Hochberg, 1995) across tests of DIF within each subtest. Effect size estimates provided by VisualDF are analogous to Cohen’s d, with .20, .50, and .80 representing small, medium, and large effects, respectively.
In addition to effect sizes, VisualDF provides signed (SID) and unsigned (UID) item difference values. These statistics represent the overall magnitude of DIF between two groups. SID values represent the average difference in effect sizes across all levels of theta. In the event of non-uniform DIF (i.e., DIF on the discrimination parameter), it is possible that SID estimates could cancel each other out, resulting in SID estimates near zero. In contrast, the UID values represent the average difference in effect sizes assuming that DIF consistently favored one group across the range of theta (i.e., uniform DIF). Therefore, different directions of DIF at different levels of ability do not cancel each other out. Results of the DIF analyses for the Phonological Awareness subtest are shown in Table 1. No items had significant DIF on the discrimination parameter. Two items had significant DIF on the difficulty parameter after correction for multiple comparisons. One of these items, which assessed children’s ability to blend words to create a compound word, showed a large degree of DIF in favor of monolingual English-speaking children, and the other item, which assessed children’s ability to blend individual phonemes to create a word, showed a small degree of DIF in favor of Spanish-speaking LM children. Results of the DIF analyses for the Print Knowledge subtest are shown in Table 2. No items had significant DIF on the discrimination parameter. Four items had significant DIF on the difficulty parameter after corrections for multiple comparisons. On each of these items, Spanish-speaking LM children were significantly less likely to respond correctly to the item than were monolingual English-speaking children with the same level of ability. One item that measured children’s knowledge of print concepts exhibited a moderate degree of DIF, whereas three letter-knowledge items exhibited small degrees of DIF.
106
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
Table 1 IRT parameters for Spanish-speaking LM and monolingual English-speaking children and effect sizes of DIF analysis for phonological awareness items. Item
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Monolingual
LM
a
b
a
b
ES
.98 .98 .66 .75 .90 .67 1.83 1.98 1.95 1.66 1.60 1.92 .92 .92 .85 1.43 .92 1.09 2.20 1.99 2.80 2.29 1.47 2.52 2.76 2.68 2.32
−2.48 −2.12 −1.59 −.90 −1.22 −.81 −.58 −.46 −.30 .45 .58 .61 −3.07 −3.42 −1.74 −1.87 −2.44 −1.72 −1.29 −1.27 −1.13 −.31 −.01 −.34 −.17 .28 .30
1.09 1.02 .82 .76 .64 .51 1.45 1.46 1.68 1.30 1.13 1.44 .93 1.17 .59 1.19 1.30 1.28 2.36 2.50 2.25 1.98 1.39 2.71 2.90 2.53 2.58
−2.06 −1.88 −1.41 −.72 −1.67 −1.03 −.56 −.30 −.30 .56 .85 .61 −2.86 −2.59 −1.24 −1.63 −1.92 −1.52 −1.28 −1.09 −1.06 −.34 −.09 −.38 −.30 .14 .04
−.22 −.16 .04 −.16 .11 .05 −.06 −.17 −.01 −.02 −.09 .09 −.15 −.27 −.81 −.33 −.06 −.05 .01 −.08 −.10 .02 .08 .04 .12 .13 .22
SID
−.03 −.02 .01 −.03 .02 .01 −.02 −.05 −.00 −.01 −.02 .02 −.01 −.03 −.12 −.06 −.01 −.01 .00 −.02 −.03 .01 .02 .01 .04 .04 .07
UID
.03 .02 .02 .03 .04 .03 .04 .06 .02 .04 .05 .04 .01 .03 .12 .06 .03 .02 .01 .03 .04 .02 .02 .02 .04 .04 .08
Note: LM = Spanish-speaking language-minority children. ES = effect size. SID = signed item difference. UID = unsigned item difference. Parameters that are significantly different from each other after correction for Type-I error rate and the corresponding effect sizes are bolded.
Results of DIF analyses for items on the Definitional Vocabulary subtest are shown in Table 3 (naming component of items) and Table 4 (definitional component of items). Six items (Items 12, 13, 22, 27, 32, 33) exhibited significant DIF across both the naming and definitional components of the items. Evidence of significant DIF emerged on the naming component for an additional nine items, and evidence of significant DIF emerged on the definitional component for an additional three items. In the majority of cases DIF occurred on the difficulty parameter; however, significant DIF on the discrimination parameter occurred for three naming and two definitional components of items (for Item 32 both the naming and definitional components exhibited significant DIF on the discrimination parameter). Of the 12 instances in which the naming component of the item exhibited significant DIF on the difficulty parameter, eight were in favor of monolingual Englishspeaking children and four were in favor of Spanish-speaking LM children. In contrast, of the seven instances in which the definitional component of the item exhibited significant DIF on the difficulty parameter, two were in favor of monolingual English-speaking children and five were in favor of Spanish-speaking LM children. Of the five instances in which items exhibited DIF on the discrimination parameter, three were more discriminative for monolingual English-speaking children and two were more discriminative for Spanish-speaking LM children. In general, effect size estimates for DIF on the difficulty parameter were in the small to moderate range, although for Item 31 the estimate of effect size did not reach Cohen’s (1988) threshold for a small effect, despite statistical significance.3
3 Effect size estimates for non-uniform DIF were not evaluated, as they frequently cancel each other out.
Table 2 IRT parameters for Spanish-speaking LM and monolingual English-speaking children and effect sizes of DIF analysis for print knowledge items. Item
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Monolingual
LM
a
b
a
b
1.28 .91 .66 .79 1.76 2.18 2.19 1.10 1.92 1.86 1.69 1.93 2.89 1.85 3.51 2.32 3.71 2.55 2.38 2.48 3.47 3.31 2.32 2.52 2.63 3.56 4.22 3.37 3.67 2.90 3.96 1.74 4.18 5.51 5.48 4.80
−2.38 −1.73 −1.24 −.21 −.85 −.73 −.60 −.41 −.44 −.26 −.27 −.22 −1.60 −1.93 −1.43 −1.54 −1.29 −1.08 −1.05 −.93 −.71 −0.90 −1.14 −1.19 −1.10 −.85 −.94 −.72 −.71 −.68 −.58 .28 −.52 −.58 −.46 −.42
1.16 .92 .77 .68 1.68 1.89 1.93 1.15 1.80 2.00 1.77 1.94 2.11 2.32 2.52 2.24 3.47 2.68 2.19 2.69 3.23 3.42 2.19 2.45 2.30 3.21 3.28 3.04 4.40 3.74 4.27 1.56 3.96 7.11 4.77 4.08
−1.83 −1.69 −.85 .11 −.88 −.85 −.69 −.48 −.44 −.30 −.24 −.25 −1.83 −1.44 −1.22 −1.24 −1.27 −1.08 −1.05 −1.09 −.87 −1.02 −1.00 −1.12 −.93 −.85 −.95 −.78 −.74 −.73 −.55 .20 −.55 −.59 −.48 −.42
ES
SID
UID
−.61 −.03 −.27 −.32 .01 .08 .06 .08 −.01 .04 −.03 .03 .05 −.29 −.26 −.27 −.03 .01 −.02 .16 .13 .11 −.14 −.07 −.19 −.01 −.02 .04 .04 .07 −.02 .09 .02 .02 .01 −.01
−.08 .00 −.04 −.05 .00 .02 .02 .02 .00 .01 −.01 .01 .01 −.05 −.06 −.06 −.01 .00 −.01 .05 .05 .03 −.04 −.02 −.05 .00 −.01 .01 .01 .02 −.01 .03 .01 .01 .01 .00
.08 .00 .04 .05 .01 .03 .02 .02 .01 .02 .01 .01 .02 .05 .06 .06 .01 .01 .01 .05 .05 .03 .03 .02 .05 .01 .02 .02 .02 .03 .01 .03 .01 .02 .01 .02
Note: LM = Spanish-speaking language-minority children. ES = effect size. SID = signed item difference. UID = unsigned item difference. Parameters that are significantly different from each other after correction for Type-I error rate and the corresponding effect sizes are bolded.
4. Discussion The purpose of this study was to evaluate whether items on the Phonological Awareness, Print Knowledge, and Definitional Vocabulary subtests of the TOPEL displayed DIF when comparing monolingual English-speaking and Spanish-speaking LM children. In general, the results of this study were consistent with hypotheses. Little evidence of DIF emerged for phonological awareness and print knowledge items; however, approximately 34% of definitional vocabulary items displayed DIF. The majority of these items displayed uniform DIF (i.e., DIF on only the difficulty parameter), indicating that these items displayed DIF in a consistent direction across the full range of theta. Given the fact that language input and vocabulary knowledge are distributed across two languages for ˜ Bedore, & Zlatic-Giunta, Spanish-speaking LM children (e.g., Pena, 2002), it was expected that items on the Definitional Vocabulary subtest would show DIF in favor of monolingual English-speaking children. In other words, monolingual English-speaking children would require lower levels of underlying ability to have a 50% chance of endorsing the correct response for definitional vocabulary knowledge than would Spanish-speaking LM children. Results for the naming components of items were consistent with this hypothesis; however, the majority of definitional components of items that displayed uniform DIF favored Spanish-speaking LM
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110 Table 3 IRT parameters for Spanish-speaking LM and monolingual English-speaking children and effect sizes of DIF analysis for the naming component of definitional vocabulary items. Item
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Monolingual
LM
a
b
a
b
1.56 .90 1.32 1.27 1.01 1.49 1.46 1.67 1.80 1.21 1.22 .65 1.32 1.16 .90 1.14 2.13 1.21 1.01 2.24 1.21 1.47 2.34 3.29 2.34 1.68 .92 1.50 1.43 1.93 2.76 1.50 1.96 1.18 1.43
−3.11 −4.11 −3.98 −3.67 −3.88 −3.11 −3.24 −2.82 −2.92 −1.50 −3.92 −5.93 −1.02 −3.57 −4.69 −1.15 −1.47 −1.66 .05 −1.84 −1.13 −.65 −1.03 −1.62 −.92 .54 −.21 −.65 −.94 .46 −.73 2.25 .30 1.39 1.64
1.10 1.09 .67 1.13 .87 1.50 1.21 1.26 1.29 1.27 1.68 1.47 1.17 1.41 1.39 1.15 1.56 1.21 .95 2.50 1.74 1.48 1.61 2.77 2.27 1.11 1.08 1.72 1.66 1.52 3.05 .52 1.66 1.44 .87
−2.76 −2.60 −5.91 −3.65 −3.56 −2.90 −2.67 −3.10 −3.07 −.59 −2.57 −2.67 −.49 −2.76 −2.75 −.97 −2.00 −1.68 .96 −1.85 −.98 −.27 −.97 −1.67 −.93 .07 −.70 −.42 −.47 .49 −.92 5.32 −.07 .97 2.66
ES
−.38 −.67 .08 −.10 −.36 −.10 −.41 .02 −.06 −.69 −.48 −.63 −.43 −.31 −.58 −.14 .24 .01 −.71 .03 −.05 −.31 −.09 0.01 0.01 .57 .38 −.19 −.37 .07 .15 .24 .35 .17 −.09
SID
−.07 −.11 .01 −.01 −.04 −.02 −.08 .00 −.01 −.19 −.09 −.10 −.12 −.05 −.09 −.04 .07 .00 −.14 0.01 −.02 −.09 −.03 0.00 .00 .13 .09 −.06 −.11 .02 .06 .01 .10 .03 −.01
UID
.07 .11 .03 .01 .04 .02 .08 .03 .03 .19 .09 .10 .12 .05 .09 .04 .08 .00 .14 0.01 0.05 0.09 0.06 0.02 .00 .13 .09 .06 .11 .03 .06 .03 .10 .03 .03
107
Table 4 IRT parameters for Spanish-speaking LM and monolingual English-speaking children and effect sizes of DIF analysis for the definitional component of definitional vocabulary items. Item
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Monolingual
LM
a
b
a
b
1.69 .66 .88 .70 1.46 .80 1.42 .96 .83 .98 1.19 1.19 1.30 1.29 1.85 1.16 1.45 1.35 1.29 1.93 .88 1.15 1.43 1.27 1.86 2.69 2.04 1.52 1.34 1.49 1.81 1.70 1.04 1.39 1.47
−2.80 −3.00 −3.40 −2.81 −2.41 −2.66 −2.05 −1.91 −2.09 −.58 −1.63 −.82 −1.55 −1.20 −.59 −2.06 −1.03 −.25 −1.25 −1.01 −.45 −.79 −.76 .34 −.03 −1.50 −.67 −.59 −.87 0.09 −.52 −1.18 .35 −.12 .89
1.26 .78 1.10 .74 1.30 1.04 .87 1.01 .74 .93 1.62 1.01 1.51 .92 1.18 1.39 1.41 1.12 1.44 2.07 1.07 1.40 1.57 .96 1.32 3.36 2.20 1.69 1.64 1.47 1.99 2.71 1.20 1.43 1.47
−3.22 −2.77 −2.70 −2.87 −2.41 −2.30 −3.17 −1.74 −2.67 −.46 −1.24 −1.44 −.88 −1.60 −.86 −1.90 −1.26 .10 −.88 −1.48 −.71 −1.02 −.88 .29 .13 −1.62 −1.04 −.81 −.78 0.06 −.64 −1.04 −.16 −.16 1.03
ES
SID
USID
.09 .03 −.25 .11 −.05 −.04 .38 −.10 .32 −.09 −.19 .44 −.47 .19 .19 −.03 .17 −.25 −.26 .34 .20 .21 .10 .19 −.05 0.09 .30 0.18 −.05 0.03 0.10 −.03 .38 .03 −.12
.02 .00 −.04 .02 −.01 −.01 .08 −.02 .06 −.02 −.05 .11 −.13 .05 .06 −.01 .05 −.06 −.08 .11 .05 .06 .03 .04 −.01 0.03 0.10 0.06 −.02 0.01 0.03 −.01 .09 .01 −.02
.03 .02 .04 .02 .02 .03 .08 .02 .06 .02 .06 .11 .13 .06 .08 .02 .05 .06 .08 .11 .05 .06 .03 .05 0.05 0.04 0.10 0.06 0.03 0.01 .03 .06 .09 .01 .02
Note: LM = Spanish-speaking language-minority children. ES = effect size. SID = signed item difference. UID = unsigned item difference. Parameters that are significantly different from each other after correction for Type-I error rate and the corresponding effect sizes are bolded.
Note: LM = Spanish-speaking language-minority children. ES = effect size. SID = signed item difference. UID = unsigned item difference. Parameters that are significantly different from each other after correction for Type-I error rate and the corresponding effect sizes are bolded.
children. Overall, these findings have implications for the use of the TOPEL with Spanish-speaking LM children.
4.2. Definitional vocabulary
4.1. Phonological awareness and print knowledge Consistent with hypotheses, few phonological awareness and print knowledge items displayed evidence of DIF. Both phonological awareness and print knowledge are relatively language-independent skills (Goodrich & Lonigan, 2017). Phonological awareness involves children’s ability to break large units of sound into smaller units of sound, independent of meaning. Children who are able to manipulate units of sound in L1 should also be able to do so in L2, assuming that they can accurately detect the sounds of L2. Similarly, there is a large degree of overlap in letter-sound correspondence across English and Spanish, and many letter names are similar across English and Spanish. Prior research indicates that cross-language correlations for print knowledge (e.g., Goodrich, Lonigan, & Farver, 2013; Lindsey et al., 2003) and phonological awareness skills (e.g., Branum-Martin, Tao, Garnaat, Bunta, & Francis, 2012; Melby-Lervåg & Lervåg, 2011) are moderate to large, providing support for the language-independent nature of these constructs. Therefore, it is unsurprising that print knowledge and phonological awareness items operated similarly for Spanishspeaking LM and monolingual English-speaking children.
As expected, the definitional vocabulary subtest displayed more uniform DIF than the other subtests of the TOPEL. However, DIF did not always favor monolingual English-speaking children. For the naming components of definitional vocabulary items, the majority of items that displayed uniform DIF favored the monolingual English-speaking children. In contrast, for the definitional component of definitional vocabulary items, the majority of items that displayed uniform DIF favored the Spanish-speaking LM children. The latter finding was unexpected and warrants further consideration. Among the naming components of definitional vocabulary items that exhibited DIF in favor of monolingual English-speaking children, approximately half of the items involved words related to the home environment such as bed and stove. For Spanish-speaking LM children, it is possible that reduced familiarity and use of these words in English could explain these findings, as Spanish-speaking LM children may be more frequently exposed to the Spanish labels for objects that are strongly related to the home context. Oller, Pearson, and Cobo-Lewis (2007) posited that the unequal distribution of vocabulary knowledge in both languages typically found in Spanish-speaking LM children is linked to the locus of language acquisition (i.e., whether vocabulary is learned in the home or the school context). Thus, Spanish-speaking LM children may be more
108
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
likely to have a richer vocabulary for household items and activities in Spanish whereas school-related words are more likely to be acquired exclusively in English. Interestingly, all of the naming components of definitional vocabulary items that exhibited DIF in favor of Spanish-speaking LM children required children to identify the superordinate category that applied to a set of 4 pictures (e.g., identify the label fruit from pictures of an apple, banana, orange, and strawberry). One possible explanation of this finding is that Spanish-speaking LM children may have been better at identifying Spanish and English cognates, or words that share similar etymology, orthography, and meanings (Nash, 1997), than English words that do not share commonalities with their Spanish counterparts. Because a high proportion of general academic vocabulary words are Latinbased, they are frequently cognates across Spanish and English. Spanish-speaking LM children may be able to use knowledge of the word in L1 to facilitate the acquisition of these words in L2, especially if the words have clear orthographic overlap (Goodrich et al., 2013; Lubliner & Hiebert, 2011; Nagy, García, Durgunoglu, & Hancin-Bhatt, 1993). Thus, it is possible that the Spanish-speaking LM children in our sample were better able to identify superordinate categories such as fruit and animals because of their near identical etymology, orthography, and meaning in Spanish (i.e., frutas; animales). Another explanation is that Spanishspeaking LM children may not yet have acquired the individual English vocabulary words within the superordinate category but instead know the global term for the group. Because the scoring of the measure only allots points if the respondent child provides the superordinate term (i.e. “animal” versus “cat” or “dog”), evidence of DIF in favor of Spanish-speaking LM children could be due to the scoring strategy. For instance, because monolingual English-speaking children had stronger English language skills than did Spanish-speaking LM children, it could be that their first inclination was to state the names of the objects depicted in the pictures instead of naming the superordinate term. Spanishspeaking LM children might not have known the name of each object in English but did know the superordinate term. Therefore, Spanish-speaking LM children may have been more likely to use the superordinate term rather than name each individual item pictured. Further research is needed to uncover the mechanisms through which definitional vocabulary items operate differently for Spanish-speaking LM and monolingual English-speaking children. In contrast to the effects for the naming component of Definitional Vocabulary items, for which the majority of DIF favored monolingual English-speaking children, items that exhibited DIF on the definitional components generally favored Spanish-speaking LM children. Given the results of prior studies indicating that vocabulary items typically demonstrate DIF in favor of monolingual English-speaking students (e.g., Kim & Jang, 2009), this finding was unexpected. One potential explanation for the discrepancy in findings for the naming and definitional components of items is that Spanish-speaking LM children may be able to describe a feature of function of an object without having specific knowledge of its label. For example, Spanish-speaking LM children may be able to convey understanding that a bird is an animal that flies without knowing the English word bird. In this instance, children would provide a correct answer for the definitional but not the naming component of the item. Although items that assess conceptual knowledge and do not require knowledge of specific lexical labels may demonstrate less DIF in favor of monolingual English-speaking children, this explanation does not account for the observed DIF in favor of Spanish-speaking LM children for the definitional components of items. It is unclear what could be driving these results, as examination of specific items that demonstrated DIF in favor of
Spanish-speaking LM children did not reveal characteristics that were common across those items. 4.3. Implications Overall, the results of this study indicate that the TOPEL can be used with Spanish-speaking LM children with little concern regarding test bias, especially for the Phonological Awareness and Print Knowledge subtests. There was a moderate amount of uniform DIF in favor of monolingual English-speaking children for the naming components of definitional vocabulary items; however, when uniform DIF was examined across both the naming and definitional components of definitional vocabulary items, there was not a consistent pattern of uniform DIF in favor of monolingual English-speaking children (i.e., 10 items displayed uniform DIF in favor of monolingual English-speaking children and nine items displayed uniform DIF in favor of Spanish-speaking LM children). Moreover, when DIF was present, effect sizes were small to medium in most cases. Because observed DIF on the Definitional Vocabulary subtest did not consistently favor monolingual Englishspeaking children–and frequently favored Spanish-speaking LM children–scores on the Definitional Vocabulary subtest of the TOPEL should still provide a relatively unbiased estimate of Spanish-speaking LM children’s vocabulary skills. These results are consistent with findings from prior studies that report more DIF for vocabulary items than for other item types on assessments of literacy-related skills (Kim & Jang, 2009; Koo et al., 2014) as well as studies that have reported relatively little DIF on measures of early literacy skills among Spanish-speaking LM children (Farrington et al., 2015; Rainelli et al., 2017). Results indicated that the reliability of the Phonological Awareness subtest was nearly identical for both monolingual English-speaking and Spanish-speaking LM children. The range of ability at which the reliability of the Print Knowledge and Definitional Vocabulary subtests was adequate was larger for Spanish-speaking LM children than it was for monolingual Englishspeaking children. This finding may be an artifact of the age range of children in the sample, however. The TOPEL was developed for use with preschool children (i.e., 3- to 6-years of age), and the data used for this study came from the end of children’s final year of preschool prior to entering kindergarten. Because knowledge of words is distributed across languages for Spanish-speaking LM chil˜ et al., 2002), Spanish-speaking LM children had dren (e.g., Pena lower scores on the Definitional Vocabulary subtest than did monolingual English-speaking children. If monolingual English-speaking children were approaching the ceiling of the measure by the end of preschool but Spanish-speaking LM children were not, there would be less variance in vocabulary knowledge for monolingual Englishspeaking children than there was for Spanish-speaking LM children at above-average levels of ability. Consequently, at above-average levels of ability the TOPEL would be less reliable at measuring vocabulary knowledge for monolingual English-speaking children than it is for Spanish-speaking LM children toward the end of the preschool years. Overall, the pattern of results of this study and prior research (e.g., Farrington et al., 2015; Mahoney, 2008) indicates that Englishlanguage assessments of academic skills can be used to reliably assess those skills among Spanish-speaking LM children. However, the purpose of assessment should ultimately guide decisions regarding the language in which the assessment should be conducted. If the purpose of assessment is to identify a disability, it is important to include measures of ability in children’s L1 as well as L2 to ensure that any observed deficit in ability is truly representative of a deficit and not the result of limited proficiency in L2 (AERA, APA, NCME, 2014). This may be especially important when attempting to identify specific language impairments, as Spanish-speaking
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
LM children’s language input is distributed across two languages, resulting in low levels of vocabulary knowledge in L1 and L2 at any given age when compared to monolingual English-speaking norms (e.g., Mancilla-Martinez & Vagh, 2013; Pearson, Fernández, & Oller, 1993). In contrast, if the purpose of assessment is to predict future development of skills in a given language, then that specific language should be the language of assessment, as evidence indicates that the longitudinal relations between academic skills in L1 and L2 are stronger within than across languages (e.g., Goodrich, Lonigan, Kleuver, & Farver, 2016; Solari et al., 2014). 4.4. Limitations Despite the numerous strengths of this study (e.g., large sample size, diverse group of Spanish-speaking children), this study had several limitations. First, this sample primarily came from lowincome backgrounds, and, therefore, the results of this study may not generalize to other groups of LM children. For example, SES is significantly related to the developing language skills of LM children (e.g., Hoff, 2013), and items on the TOPEL may not function differently for monolingual English-speaking children and Spanishspeaking LM children from high SES backgrounds. Additionally, although Spanish-speaking LM children in this study came from geographically diverse regions of the U.S., there were not data on the dialects of Spanish (e.g., Mexican, Cuban, Puerto Rican) to which children were exposed. It is possible that the degree to which DIF occurs on the Definitional Vocabulary subtest is a function of which dialect of Spanish children speak. Additionally, it is possible that results would differ for LM children from different language backgrounds. For example, more evidence of DIF may be present on the Phonological Awareness and Print Knowledge subtests when comparing monolingual English-speaking children to speakers of non-alphabetic languages (e.g., Mandarin Chinese). Finally, a fundamental assumption of DIF analysis is that testing DIF for any one item assumes that no other items on the assessment show DIF. One direction for future research is empirical identification of an ideal set of anchor items for a given assessment that do not show DIF against which DIF for other items could be tested. 5. Conclusions Overall, the results of this study indicate that the TOPEL can be used to assess English early literacy skills among Spanishspeaking LM children. Relatively little evidence of uniform DIF emerged across the three subtests, and, in some cases, estimates of DIF indicated Spanish-speaking LM children performed better on certain items than did monolingual English-speaking children. Nevertheless, it is important to consider the purpose of assessment when determining the language of assessment, and future research should evaluate whether English-language assessments function similarly for monolingual English-speaking and Spanish-speaking LM children, especially when the norming samples did not include sufficient numbers of Spanish-speaking LM children. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at https://doi.org/10.1016/j.ecresq.2018.10. 007. References Abedi, J., Lord, C., & Hofstetter, C. (1998). Impact of selected background variables on students’ NAEP math performance (CSE technical report 478). Los Angeles, CA: University of California Regents.
109
Abedi, J., Hofstetter, C. H., & Lord, C. (2004). Assessment accommodations for English language learners: Implications for policy-based empirical research. Review of Educational Research, 74, 1–28. Abedi, J. (2004). The no child left behind act and English language learners: Assessment and accountability issues. Educational Researcher, 33, 4–14. Abedi, J. (2011). Assessing English language learners: Critical issues. In M. D. R. Basterra, E. Trumbull, & G. Solano-Flores (Eds.), Cultural validity in assessment: Addressing linguistic and cultural diversity (pp. 49–71). New York, NY: Routledge. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, D.C: American Educational Research Association. Anthony, J. L., & Francis, D. J. (2005). Development of phonological awareness. Current Directions in Psychological Science, 14, 255–259. Artiles, A. J., Rueda, R., Salazar, J. J., & Higareda, I. (2005). Within-group diversity in minority representation: English language learners in urban school districts. Exceptional Children, 71, 283–300. ˜ E. D., García, M., & Cortez, C. (2005). Conceptual versus Bedore, L. M., Pena, monolingual scoring: When does it make a difference? Language, Speech, and Hearing Services in Schools, 36, 188–200. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, 57, 289–300. Branum-Martin, L., Tao, S., Garnaat, S., Bunta, F., & Francis, D. J. (2012). Meta-analysis of bilingual phonological awareness: Language, age, and psycholinguistic grain size. Journal of Educational Psychology, 104, 932–944. Callahan, R. M. (2013). The English learner dropout dilemma: Multiple risks and multiple resources (Report No. 19).. Retrieved from California Dropout Research Project website: http://cdrpsb.org/researchreport19.pdf Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Cummins, J. (2017). Teaching for transfer in multilingual school contexts. In O. García, A. M. Y. Lin, & S. May (Eds.), Bilingual and multilingual education (pp. 103–116). Cham, Switzerland: Springer International Publishing AG. Durgunoglu, A. Y., Nagy, W. E., & Hancin-Bhatt, B. J. (1993). Cross-language transfer of phonological awareness. Journal of Educational Psychology, 85, 453–465. Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates, Inc. Farrington, A. L., Lonigan, C. J., Phillips, B. M., Farver, J. M., & McDowell, K. D. (2015). Evaluation of the utility of the Revised Get Ready to Read! for Spanish-speaking English-language learners through differential item functioning analysis. Assessment for Effective Intervention, 40, 216–227. Goodrich, J. M., Lonigan, C. J., & Farver, J. M. (2013). Do early literacy skills in children’s first language promote development of skills in their second language? An experimental evaluation of transfer. Journal of Educational Psychology, 105, 414–426. Goodrich, J. M., & Lonigan, C. J. (2017). Language-independent and language-specific aspects of early literacy: An evaluation of the common underlying proficiency model. Journal of Educational Psychology, 109, 782–793. Goodrich, J. M., Lonigan, C. J., Kleuver, C. G., & Farver, J. M. (2016). Development and transfer of vocabulary knowledge in Spanish-speaking language minority preschool children. Journal of Child Language, 43, 969–992. Gottardo, A., & Mueller, J. (2009). Are first- and second-language factors related in predicting second-language reading comprehension? A study of Spanish-speaking children acquiring English as a second language from first to second grade. Journal of Educational Psychology, 101, 330–344. Hammer, C. S., Lawrence, F. R., & Miccio, A. W. (2007). Bilingual children’s language abilities and early reading outcomes in head start and kindergarten. Language, Speech, and Hearing Services in Schools, 38, 237–248. Hoff, E. (2013). Interpreting the early language trajectories of children from low-SES and language-minority homes: Implications for closing achievement gaps. Developmental Psychology, 49, 4–14. Kieffer, M. J., Lesaux, N. K., Rivera, M., & Francis, D. J. (2009). Accommodations for English language learners taking large-scale assessments: A meta-analysis on effectiveness and validity. Review of Educational Research, 79, 1168–1201. Kim, Y., & Jang, E. E. (2009). Differential functioning of reading subskills on the OSSLT for L1 and ELL students: A multidimensionality model-based DBF/DIF approach. Language Learning, 59, 825–865. Klingner, J. K., Artiles, A. J., & Barletta, L. M. (2006). English language learners who struggle with reading: Language acquisition or LD? Journal of Learning Disabilities, 39, 108–128. Koo, J., Becker, B. J., & Kim, Y. (2014). Examining differential item functioning trends for English language learners in a reading test: a meta-analytical approach. Language Testing, 31, 89–109. Lindsey, K. A., Manis, F. R., & Bailey, C. E. (2003). Prediction of first-grade reading in Spanish-speaking English-language learners. Journal of Educational Psychology, 95, 482–494. Lonigan, C. J., Wagner, R. K., Torgesen, J. K., & Rashotte, C. A. (2007). Test of preschool early literacy. Austin, TX: Pro-Ed. Lubliner, S., & Hiebert, E. H. (2011). An analysis of English–Spanish cognates as a source of general academic language. Bilingual Research Journal, 34, 76–93. Mahoney, K. (2008). Linguistic influences on differential item functioning for second language learners on the national assessment of educational progress. International Journal of Testing, 8, 14–33.
110
J.M. Goodrich et al. / Early Childhood Research Quarterly 47 (2019) 99–110
Mancilla-Martinez, J., & Vagh, S. B. (2013). Growth in toddlers’ Spanish, English, and conceptual vocabulary knowledge. Early Childhood Research Quarterly, 28, 555–567. Manis, F. R., Lindsey, K. A., & Bailey, C. E. (2004). Development of reading in grades k-2 in Spanish-speaking English language learners. Learning Disabilities Research & Practice, 19, 214–224. Martiniello, M. (2009). Linguistic complexity, schematic representations, and differential item functioning for English language learners in math tests. Educational Assessment, 14, 160–179. Meade, A. W. (2010). A taxonomy of effect size measures for the differential functioning of items and scales. Journal of Applied Psychology, 95, 728–743. Melby-Lervåg, M., & Lervåg, A. (2011). Cross-linguistic transfer of oral language, decoding, phonological awareness and reading comprehension: A meta-analysis of the correlational evidence. Journal of Research in Reading, 34, 114–135. ˜ Munoz, M. L., White, M., & Horton-Ikard, R. (2014). The identification conundrum. The ASHA Leader, 19, 48–53. Muthén, L. K., & Muthén, B. O. (1998–2015). Mplus User’s Guide (seventh edition). Los Angeles, CA: Muthén & Muthén. National Center for Educational Statistics. (2011). Achievement gaps: How Hispanic and White students in public schools perform in mathematics and reading on the National Assessment of Education Progress.. Retrieved from https://nces.ed.gov/ nationsreportcard/pdf/studies/2011485.pdf National Center for Educational Statistics. (2015). Mathematics grades 4 and 8 assessment report cards: Summary data tables for national and state scores and achievement level results.. Retrieved from https://www.nationsreportcard.gov/ reading math 2015/#mathematics/groups?grade=4 National Center for Educational Statistics. (2016a). English language learners in public schools.. Retrieved from https://nces.ed.gov/programs/coe/pdf/coe cgf. pdf National Center for Educational Statistics. (2016b). Reading performance.. Retrieved from https://nces.ed.gov/programs/coe/pdf/coe cnb.pdf Nagy, W., García, G., Durgunogiu, A., & Hancin-Bhatt, B. (1993). Spanish–English bilingual students’ use of cognates in English reading. Journal of Reading Behavior, 25, 241–259. Nash, R. (1997). NTC’s dictionary of Spanish cognates. Chicago, IL: NTC Publishing Group. Oller, D. K., Pearson, B. Z., & Cobo-Lewis, A. B. (2007). Profile effects in early bilingual language and literacy. Applied Psycholinguistics, 28, 191–230. ˜ E. D., Bedore, L. M., & Zlatic-Giunta, R. (2002). Category-generation Pena, performance of bilingual children. Journal of Speech, Language, and Hearing Research, 45, 938–947.
Pearson, B. Z., Fernández, S. C., & Oller, D. K. (1993). Lexical development in bilingual infants and toddlers: Comparison to monolingual norms. Language Learning, 43, 93–120. Penfield, R. D., & Lam, T. C. (2000). Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement, 19, 5–15. Prevoo, M. J. L., Malda, M., Mesman, J., & van Ijzendoorn, M. H. (2016). Within- and cross-language relations between oral language proficiency and school outcomes in bilingual children with an immigrant background: a meta-analytical study. Review of Educational Research, 86, 237–276. Proctor, C. P., August, D., Carlo, M. S., & Snow, C. (2006). The intriguing role of Spanish language vocabulary knowledge in predicting English reading comprehension. Journal of Educational Psychology, 98, 159–169. Proctor, C. P., Silverman, R. D., Harring, J. R., & Montecillo, C. (2012). The role of vocabulary depth in predicting reading comprehension among English monolingual and Spanish–English bilingual children in elementary school. Reading and Writing, 25, 1635–1664. Rainelli, S., Bulotsky-Shearer, R. J., Fernandez, V. A., Greenfield, D. B., & López, M. (2017). Validity of the first two subtests of the preschool language assessment scale as a language screener for Spanish-speaking preschool children. Early Childhood Research Quarterly, 38, 10–22. Storch, S. A., & Whitehurst, G. J. (2002). Oral language and code-related precursors to reading: Evidence from a longitudinal structural model. Developmental Psychology, 38, 934–947. Solari, E. J., Aceves, T. C., Higareda, I., Richards-Tutor, C., Filippini, A. L., Gerber, M. M., et al. (2014). Longitudinal prediction of 1st and 2nd grade English oral reading fluency in English language learners: Which early reading and language skills are better predictors? Psychology in the Schools, 51, 126–142. Sullivan, A. L. (2011). Disproportionality in special education identification and placement of English language learners. Exceptional Children, 77, 317–334. Thissen, D. (2001). IRTLRDIF V. 2.0 b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Chapel Hill, NC: LL Thurstone Psychometric Laboratory. United States Congress. (2004). Individuals with Disabilities Education Improvement Act of 2004.. Available at: https://www.gpo.gov/fdsys/pkg/PLAW-108publ446/ pdf/PLAW-108publ446.pdf Wolf, M. K., & Leon, S. (2009). An investigation of the language demands in content assessments for English language learners. Educational Assessment, 14, 139–159.