Eye color and the practice of statistics in Grade 6: Comparing two groups

Eye color and the practice of statistics in Grade 6: Comparing two groups

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx Contents lists available at ScienceDirect Journal of Mathematical Behavior journal homepage: www...

2MB Sizes 1 Downloads 72 Views

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

Contents lists available at ScienceDirect

Journal of Mathematical Behavior journal homepage: www.elsevier.com/locate/jmathb

Eye color and the practice of statistics in Grade 6: Comparing two groups ⁎

Jane Watsona, , Lyn Englishb a b

University of Tasmania, Australia Queensland University of Technology, Australia

AR TI CLE I NF O

AB S T R A CT

Keywords: Comparing groups Grade 6 students Informal inference Statistical reasoning in middle school

This study followed the progress of 85 Grade 6 students as they expanded their understanding and application of the practice of statistics to include comparing two groups of people: those with brown eyes and those with eyes of other colors. Based on a claim in the media that brown-eyed people had faster reaction times than others, the students collected data from their class to explore and evaluate the claim for their class and make an inference for all Grade 6 students. They then collected and analyzed four random samples of the same size as their class from a national “population” of Grade 6 students. Finally the data from the “population” of 1786 Grade 6 students were used to evaluate the claim. Data for analysis of student capacity to engage in the practice of statistics were collected from student workbooks completed while carrying out the activity, corroborated by transcripts of all class discussion, and from an assessment administered following the activity. Although the correlation of outcomes from the workbook and assessment was significant (p < 0.01) and many students completed the activity in a highly competent manner, the analysis also found conceptual understanding was not retained as well as procedural understanding.

1. Introduction As more research is carried out in relation to the implementation of school curricula including statistics, the question arises as to how far students can progress before they reach the need for formal statistics. Can they understand the concepts involved if they do not have formal techniques such as means, standard deviations, confidence intervals, and t-tests? Near the end of a 3-year longitudinal project introducing beginning inference, Grade 6 students participated in an activity that gave them the opportunity to answer a question based on comparing two groups, the type of question that would be introduced in a beginning tertiary level statistics course after the introduction of the normal distribution, t-distribution, and one-sample inference procedures (Moore & McCabe, 1989). Working within an informal inference context (Makar & Rubin, 2009), there was interest in what evidence students would use to make a decision and what degree of confidence they would have in that decision. Having previously experienced decision-making about typicality in a single set of data, students were introduced to a second variable, creating two groups for comparison. This paper surveys the three perspectives on statistics education that suggest such a study is feasible and relevant; presents the background of the students in readiness for the activity, including understanding measures of center; reports on the students’ capacities to deal with the demands of the activity; and assesses their recall two weeks following the activity.



Corresponding author. E-mail address: [email protected] (J. Watson).

http://dx.doi.org/10.1016/j.jmathb.2017.06.006 Received 4 November 2016; Received in revised form 2 June 2017; Accepted 29 June 2017 0732-3123/ © 2017 Elsevier Inc. All rights reserved.

Please cite this article as: Watson, J., Journal of Mathematical Behavior (2017), http://dx.doi.org/10.1016/j.jmathb.2017.06.006

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

2. Perspectives and literature review Three perspectives on statistics education provide the foundation and motivation for the study reported here. First is the growing interest in introducing students to authentic statistical investigations during the primary school years, going beyond a focus on the tools to be used in later more theoretical statistics. Second is the mystic associated with comparing two independent samples with ttests, which is very common in tertiary statistics courses and much science and social science research. Third is the recognition that as well as learning and applying statistical techniques, students need to be able to understand and judge the results and claims of others. This is embodied in statistical literacy and includes the critical thinking ability to challenge, when necessary, claims that are made in wider society. These perspectives are considered in turn. 2.1. The practice of statistics The phrase “practice of statistics” was coined by Moore and McCabe (1989) for an introductory tertiary statistics textbook the same year as the National Council of Teachers of Mathematics (NCTM, 1989) published the Curriculum and Evaluation Standards for School Mathematics. Moore and McCabe’s intent was to “introduce readers to statistics as it is practiced … focused on problem solving” (p. xi). The NCTM laid the groundwork for accepting the importance of the practice of statistics early in students’ mathematics education, with students in Grades K to 4 having the following experiences:

• collect, organize, and describe data; • construct, read, and interpret displays of data; • formulate and solve problems that involve collecting and analyzing data. (p. 54) These are the ingredients that were later put together in the Guidelines for Assessment and Instruction in Statistics Education (GAISE) (Franklin et al., 2007) by a joint committee of the American Statistical Association (ASA) and the NCTM for statistics education at school. The GAISE framework for “statistical problem solving” is based on a foundation acknowledging variability:

• Formulate questions, anticipating variability; • Collect data, acknowledging variability; • Analyze data, taking account of variability; • Interpret results, allowing for variability. (p. 11) This framework places the experiences of the NCTM Standards in the required order for carrying out a statistical investigation. The emphasis on variability reflects the foundation laid by Moore (1990), whereas the label “problem solving” recognizes the earlier view of Rao (1975) that “statistics ceases to have meaning if it is not related to any practical problem” (p. 152). In the years between 1989 and 2007, Wild and Pfannkuch (1999) analyzed the work of their statistical consultant colleagues and produced a 4-dimensional model of a statistician’s work. The investigative cycle dimension closely resembles the GAISE framework, suggesting steps of Problem, Plan, Data, Analysis, and Conclusion (PPDAC), with Plan and Data combined in the GAISE description of Collect data. PPDAC has influenced the New Zealand Mathematics and Statistics Curriculum (Ministry of Education, 2009), which includes a subheading for Statistical Investigation at every level of the school curriculum. The decision-making at this level of students’ education is now generally termed “informal statistical inference” (Makar & Rubin, 2009), acknowledging that “formal inference” is the phrase used at the tertiary level where a theoretical foundation is laid first. Informal inference refers to decision-making in relation to a statistical question for a population based on evidence from a sample and acknowledging a degree of uncertainty in that decision. The degree of certainty depends on the quality of the evidence, which at the school level has a descriptive rather than a theoretical basis. In recent years, classroom interventions to document and evaluate primary school students’ capabilities to undertake the practice of statistics have had a variety of foci, including having students pose survey questions for an investigation (e.g., English, 2014; Lavigne & Lajoie, 2007), having students collect data themselves (e.g., English & Watson, 2015a, 2015b Watson & English, 2015) and having students represent and analyze data to draw conclusions (e.g., Ben-Zvi, Aridor, Makar, & Bakker, 2012; Makar, 2014). The specific acknowledgement of uncertainty in the decisions reached has also received attention (e.g., Zieffler & Fry, 2015). Few studies, however, combine all aspects of the practice in one study. Completing a statistical investigation involves both procedural and conceptual understanding (Baroody, Feil, & Johnson, 2007; Hiebert & Carpenter, 1992). Procedural aspects of calculating measures of center or drawing specific types of graphs are required as contributions to the conceptual understanding needed to connect all of the components of the GAISE framework for a complete investigation (Franklin et al., 2007). There is much literature on the relationship of these two types of understanding in mathematics more generally than statistics; for example, Hiebert and Grouws (2007) speak of skill efficiency and conceptual understanding, with the former related to procedures and contributing to the latter (p. 380). The question of procedural knowledge being deeper than skill efficiency is addressed by Baroody et al. (2007) in suggesting the interrelationship of procedural and conceptual understanding with the possibility of deep procedural knowledge arising from links to related conceptual knowledge. The aim of educational interventions should be to foster both types of knowledge and their continuing support of each other. In the context of statistical investigations, Groth and Bergner (2006) considered the existence and relationship of procedural and conceptual knowledge in understanding of measures of center by preservice elementary teachers. Groth (2014) extended this work to the concept of variation by considering preservice teachers’ procedural and conceptual knowledge of the mean absolute deviation. Again the relationship was 2

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

hierarchical from procedural to conceptual. At the school level development of both procedural and conceptual understanding is essential and their relationship may be complex. This relationship is a focus of the current study. 2.2. Comparing two data sets When completing the practice of statistics many types of problems can be posed. The one used in this study follows in a tradition popular at the tertiary level when theoretical statistics are introduced employing t-tests to compare two data sets to decide if they represent different or similar findings (e.g., Moore & McCabe, 1989). Although not expecting the use of formal techniques for comparing two data sets, there has been interest in the methods and tools school students would use for comparing data sets for many years. In fact, it was the same year as the NCTM Standards that Gal, Rothschild, and Wagner (1989) first reported on interviews with Grade 3 and 6 students deciding which of two groups of frogs had jumped further, with data presented in a graphical format. Later they worked with students in Grades 3, 6, and 9, including data sets of unequal size for comparison, finding most students did not use the arithmetic mean in either this setting or one where the data sets were of equal size (Gal et al., 1990). A majority of Grade 6 and 9 students, however, were familiar with the mean in both contexts. Watson and Moritz (1999) used one of Gal et al.’s (1990) protocols in interviews with 88 Australian students in Grades 3–9, with findings similar to Gal et al., in that students were not naturally choosing the mean as a tool for comparison. In a follow-up longitudinal study of 42 of these students three or four years later, Watson (2001) found that of those who could improve their performance, 62% did so. For some time suggestions have been made about activities for school students based on comparing groups; for example, Rubin, Corwin, and Friel (1997) suggested comparing the lengths of male and female cats and comparing how long two groups (e.g., children and adults) could stand on one foot with their eyes closed. Brodesky, Doherty, and Stoddard (2008) suggested comparing signature length, either using dominant and non-dominant hands or considering males and females. Their ideas were supported by the software TinkerPlots: Dynamic Data Exploration (Konold & Miller, 2005), as were suggestions for comparing the lengths of two types of venomous snakes and for comparing various characteristics of male and female students by Watson et al. (2011). These contexts, as well as considering the fizz time of different effervescent antacid tablets (Kader & Jacobbe, 2013), provide starting points for research, but detailed research on classroom activities has not been common. As early as 1999, Cobb used a minitool as a basis for in-depth classroom discussion of comparing two groups. His focus was considering a class’s development of structuring and partitioning data for creating arguments. TinkerPlots was instrumental in the development of informal inference in the Connections project (Ben-Zvi, Gil, & Apel, 2007) and comparing groups was one of the themes introduced for Grade 5 students using TinkerPlots (Makar, Bakker, & Ben-Zvi, 2011). Allmond and Makar (2014) used a setting of comparing groups in a series of lessons with middle school students introducing hat plots and boxplots in TinkerPlots to focus on variability in distributions. At the tertiary level, Biehler (2007) and Frischemeier (2014) reported on tertiary students’ comparison of groups using TinkerPlots and boxplots, as did delMas, Garfield, and Zieffler (2014). Bakker and Derry (2011) claimed that the exploration of this and other themes, using TinkerPlots, resulted in much more intuitive use of measures of center, such as mean and median, for school students. This was an improvement from the earlier observations of Gal et al. (1990) and Watson and Moritz (1999). In the current study TinkerPlots (Konold & Miller, 2011) was used as the tool for a complete investigation comparing two data sets. By 2000, the NCTM’s Principles and Standards for School Mathematics recognized that by Grades 3–5 students should “compare related data sets, with an emphasis on how the data are distributed” (p. 176). This emphasis along with much of the content of Data Analysis and Probability, however, was lost in the later document Curriculum Focal Points for Prekindergarten through Grade 8 Mathematics (NCTM, 2006). In 2010, the Common Core State Standards for Mathematics (Common Core State Standards Initiative, 2010) did not consider Statistics and Probability until Grade 6 and moved directly into sampling from two populations in Grade 7 to make comparisons. In New Zealand, the Achievement Objectives for Mathematics and Statistics (Ministry of Education, 2009) introduced multivariate data at Level 3, where Level 3 is one of eight progressive levels across 13 years of schooling. In Australia, however, the Australian Curriculum Mathematics (Australian Curriculum, Assessment and Reporting Authority (ACARA), 2015) did not mention comparing two sets of numerical data until Grade 9, where mean, median, and range were used to do so. This study challenges the Australian curriculum by introducing comparison of data sets in Grade 6. 2.3. Critical statistical literacy Statistical literacy for school students is generally recognized as “the meeting point of the data and chance curriculum and the everyday world, where encounters involve unrehearsed contexts and spontaneous decision-making based on the ability to apply statistical tools, general contextual knowledge, and critical literacy skills” (Watson, 2006, p. 11). Gal (2002) reiterates this point for adults and specifically adds the ability to communicate reactions to claims. Besides developing an understanding of the practice of statistics, curricula now recognize this need for critical thinking in situations to judge the claims resulting from the practice of statistics as carried out by others. New Zealand does this in a specific subsection of the Mathematics and Statistics curriculum on “Statistical Literacy” (Ministry of Education, 2009) and Australia does it through two General Capabilities: Numeracy through Interpreting Statistical Information and Critical and Creative Thinking (ACARA, 2013). A motivating way to provide an authentic context for investigating claims is to use reports from the media. Surveys of overall statistical literacy (e.g., Watson & Callingham, 2003) generally include such extracts from the media for students to analyze and question, for example a claim about guns in US schools based on a sample in Chicago (Watson, 1998) or a misleading graph on

3

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

boating deaths (Watson & Chick, 2004). The opportunity to link the practice of statistics and statistical literacy arose for a Grade 10 class from an article in the media about eye color and reaction time claiming that brown-eyed people were faster than those with eyes of other colors (Rowe & Evans, 2007). The Grade 10 class tested themselves and the brown-eyed students had faster times on average than others. This spurred the class to collect a large sample of Grade 10 students from the Australian Bureau of Statistics (ABS) CensusAtSchool website (www.abs.gov.au/censusatschool), convincing the students that there was no difference for Grade 10 students in Australia (Watson, 2008). This context was chosen for the current study as a way of motivating a complete statistical investigation and linking the practice of statistics and statistical literacy.

2.4. Overall research question The purpose of the research reported here is to document primary students’ capacities to extend their experiences with the practice of statistics (Franklin et al., 2007) from one data set (Watson & English, in press) to include the comparison of two groups. Although now recognized by curriculum documents in some countries at this level (e.g., NCTM, 2000), there is little focus on the importance of the use of evidence and recognition of uncertainty in making decisions, as stressed by Makar and Rubin (2009). With a view to enhancing students’ critical thinking about statistical contexts they will meet outside of school, the activity used for the study was based on a media claim that created a question about two conditions. The context provided the opportunity for students to test the claim for themselves and for a larger population. The general research question considers Grade 6 students’ capacities to carry out the practice of statistics and use the results to make a decision about the claim in the media article. Details of the overall research question are given in the following section.

3. Background and method 3.1. Overall design and previous investigations The investigation presented in this paper was the sixth of seven major day-long activities that were part of a three-year longitudinal study of a group of students as they progressed from Grade 4 to Grade 6, designed to build an understanding of informal inference as part of statistical literacy. The design-based study followed a cyclic pathway of (a) design of materials for teachers and students with preparation of teachers, (b) a teaching intervention accompanied by data collection, and (c) analyses of data leading to suggestions for future teaching interventions (Cobb, Confrey, diSessa, Lehrer, & Schauble, 2003; Cobb, Jackson, & Munoz, 2016). These suggestions were sometimes based on observations related to needed reinforcement or to readiness to move further in developing the practice of statistics. For each activity, a professional learning session was held with the teachers, and they were provided with printed teaching guidelines, interspersed with the student workbooks. There were three investigations in the first year, Grade 4. Students first worked in groups to develop multiple-choice survey questions for their classmates related to possible improvements to the school playground (English & Watson, 2015b). The second investigation considered the difference in the variation of arm span lengths in two situations: the arm span of a single student measured repeatedly by all others in the class and single arm span measurements of all members of the class (English & Watson, 2015a). The third investigation was based on the modelling the outcomes first when tossing one coin and later when tossing two coins (English & Watson, 2016). The practice of statistics was introduced formally in Grade 5 with two investigations based on the 4-step GAISE framework (Franklin et al., 2007). The context for the fourth investigation was a set of questions from the ABS CensusAtSchool site about environmentally friendly habits, based on categorical (yes/no) data (Watson & English, 2015). The fifth investigation returned to collecting numerical, measurement data, this time related to typical reaction time. In preparation for the use of measurement data, a preliminary lesson formally introduced the mean, the mode, the median and the hat plot in TinkerPlots (Konold & Miller, 2011). A hat plot is a representation similar to a boxplot that highlights the middle 50% of the data symbolically with the crown of a hat, while the two brims of the hat distinguish the lowest and highest 25% of the data (Watson, Fitzallen, Creed, & Wilson, 2008). The investigation began with posing the question, “What is the typical reaction time of Grade 5 students?” (Watson & English, in press). Students collected data in two ways: by dropping and catching a ruler and by using a timing device on the ABS website. Decisions on a typical reaction time were made for their class, their school, and Australia for each method; differences in the methods were then considered. At the conclusion of the activity it was felt students appreciated the 4-step practice-ofstatistics framework and were ready to extend it to comparing two data sets. In all activities except the first students used TinkerPlots for part of their analyses.

3.2. Current study The sixth investigation at the start of the third year extended the previous one, using only data from the ABS reaction timer, but also collecting data on eye color in order to consider the question of whether brown-eyed Grade 6 students had faster reaction times than students with eyes of other colors. Also it was felt that the students were ready to engage more seriously with measures of center, including the hat plots to reinforce thinking about a range of values, as well as to explore and appreciate variation of two types, within each group and between them. 4

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

3.3. Specific research questions It was not possible to carry out formal pre- and post-tests in the school environment where the study took place. To compensate for this, preliminary questions were asked at the beginning of the students’ Workbooks, reviewing their understanding of mean, median, mode, and hat plot. Also, at the end of the activity, an assessment was created and administered to complement the results from student workbooks and class transcripts. The assessment doubled as a test to assist teachers in report-writing. The phrase “nature of understanding” is used in the research questions to characterize student capacity to engage in the practice of statistics and display the procedural and conceptual knowledge embedded within that practice. 3.3.1. Preliminary question What is the nature of understanding of measures of center that students bring to the investigation? 3.3.2. Main part of the activity What is the nature of understanding students display in making a decision comparing two groups (populations) for a statistical question, (a) first for their class data, (b) second for four random samples from a CensusAtSchool “population”, and (c) third for the CensusAtSchool “population” of 1786 students? 3.3.3. Post-activity assessment What is the nature of students’ recall after two weeks? 3.4. Participants Data were collected from 85 students engaged in the day-long investigation, having previously taken part in a teacher-initiated lesson on measures of center and hat plots in TinkerPlots and the investigation on typical reaction time the previous year (Watson & English, in press). Of the 85, 83 completed an Assessment within two weeks of finishing the main investigation. The average age of the students was 11 years and 7 months (range 11 years 0 months to 12 years 10 months) and 41% were classified as speaking English as a second language (ESL). Only students whose parents gave written permission were included in the data analysis, but all students (89) in the classes participated in the investigation. 3.5. Procedure for activity There were four classes with four different classroom teachers and the research took place across four successive school days, approximately five hours of class time for each class. The day began with a review and discussion of the four steps in the Practice of Statistics from the two major activities of the previous year, referring to a poster on the wall of the classroom, and a further review of the four measures of center (mean, median, mode, and crown of a hat plot). Referring back to the previous “typical” reaction time lesson at the end of the previous year, there was discussion of the plots of data from the four classes emphasizing the measures of center and the language of variation, including range, clumps, clusters, gaps, and spread. Students also discussed a plot of the Grade 10 class’s data comparing Brown-eyed and Other students (Watson, 2008), which showed Brown-eyed students were faster but fewer in number. While single students were having their reaction times measured with the ABS timer, others filled in a page in their Workbooks (Q2–Q5) with a definition and example of when/where each tool could be used: mode, median, mean, and hat plot. The class then read together the media extract in Fig. 1, which occupied a page in the student Workbook. Opinions were expressed around the class about whether they believed the outcomes of the researchers’ study. Referring back to the investigation of their own reaction times the previous year, students were happy to take on the investigation themselves with their class data to see if there were a difference in reaction time for those with Brown or Other-colored eyes. Confirmation of the question being posed was the first step of the Practice of Statistics: Do Brown-eyed Grade 6 students have faster reaction times than Other Grade 6 students? The students then collected the data from themselves as the second step of the investigation. All student data analysis in this investigation took place in TinkerPlots and the class data were provided in a TinkerPlots file on laptops for pairs of students to work on together. The data sets for each of the four classes are shown in Appendix A for reference by the reader, along with a summary of data for the two groups. Students were not given these plots and information, only the data set in the Data Cards in TinkerPlots, with reminders in the Workbook as to how to drag down a Plot, insert the data, and place the attributes on the horizontal and vertical axes. The third step of the Practice of Statistics, analysis, took place by the student pairs as they worked within the TinkerPlots file. Students wrote their own explanations in their individual Workbooks for how they reached their decisions, the fourth step in the Practice of Statistics. The four questions students answered in their Workbooks are shown in Fig. 2. Students were then re-introduced to the ABS CensusAtSchool website. With an ABS “population” of 1786 Grade 6 students, including attributes of eye color and reaction time, students pairs collected their own random samples of the same size as their class. This was done four times to experience the variation that occurs in sample sizes around 25. Using their four samples students reconsidered the third (Q8) and fourth (Q9) questions in Fig. 2. Finally they were allowed to use the entire ABS data set shown in Fig. 3 to reconsider again and state their confidence in their decisions about the question posed. Students created their own plots of 5

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Fig. 1. Media extract motivating the investigation.

Fig. 2. Questions in student Workbook.

the data rather than being given the format in Fig. 3. Students’ consolidation of understanding both of their use of TinkerPlots and of their capacities to make an informal inference (in the sense of Makar and Rubin (2009)) for comparing groups, was assessed with the instrument in Appendix B. The Assessment was devised by the researchers specifically for the activity undertaken for use by the classroom teachers as well as the researchers. Its validity was based on consultation with the teachers and observations by the researchers during the activity. 3.6. Data analysis The coding of the responses in students’ Workbooks as they progressed through the activity and seven questions in the Assessment was hierarchical based on the correctness of the response given the data available and the evidence or reasoning provided in support of the answer. As the evidence progressed from procedural to conceptual (e.g., Hiebert & Grouws, 2007), higher codes were assigned. Similar to the case for the assessment of responses for the previous activity on typical reaction time (Watson & English, in press), the rubrics reflected the sophistication shown in the use of the two types of knowledge. The rubrics for the Workbook questions are given in Appendix C. The entries in the student Workbooks were divided into four parts for analysis. The first part included Q2 to Q5 (Q1 recorded the student’s reaction time), asking students for the definition and an example of use for the four measures of center. The results addressed the Preliminary question of the study. Q6 to Q9 (shown in Fig. 2), the second part of the Workbook, included a decision on the question of eye color and reaction time for the class (Q6), an explanation (Q7), a decision for all Grade 6 students (Q8), and how students could be more certain of their decisions (Q9). These results addressed the research question on the Main part of the activity 6

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Fig. 3. “Population” data from the ABS website, with boxplots and mean values provided.

(a). The third part of the Workbook first asked for summaries of the plots for four random samples from the ABS and the strategies used to interpret the plots. Responses were coded in comparison to the plots saved in the TinkerPlots files, hence each pair of students had different plots to judge. First, Q10, Q12, Q14, and Q16 asked students what each plot told them. For each plot students then reported the tools or strategy they used (Q11, Q13, Q15, & Q17) and responses were assessed based on whether there was evidence in the plot of the tools and/or strategy being used. Q18 then asked if there had been any surprises from the random samples, and Q19 to Q21 again asked for a decision for all Grade 6 based on them, similar to Q6, Q7, Q8, and Q9 in Fig. 2. These results addressed the research question for the Main part of the activity (b). Finally Q22 asked for a decision and Q23 asked for a degree of certainty based on the ABS “population” of 1786 Grade 6 students. The research question for the Main part of the activity (c) was addressed by these results. Table 1 summaries the relationship of the Workbook questions to these research questions. Extracts from the transcripts of two of the class discussions were used to corroborate the findings from the student Workbooks and extend the insights into student understanding (not always expanded in writing in Workbooks). These were not coded but reflected the class within which they occurred. Although the teachers had prepared in a similar fashion, there were some differences in the discussion that accompanied the activity; in particular the different data collected for the four classes (Appendix A) led to different comments by teachers and students, as well as different conclusions. Coding for the Assessment (see Appendix B) was on a correct/incorrect basis for the factual and identification items, considered procedural in nature. For Items allowing two higher levels of sophistication, Code 1 included partial reference to the required answer and Code 2 included the salient features of the correct reason reflecting deeper procedural knowledge related to the concepts underlying the activity (Baroody et al., 2007). The last question in Part B of the Assessment was more complex with three levels of appreciation shown. A single reasonable contextual justification but no expression of uncertainty, or uncertainty but a non-statistical personal reason was coded 1. A contextual justification linked to acknowledgement of uncertainty was coded 2, whereas a statistical justification and related lack of certainty was considered based on conceptual understanding and coded 3. The rubric for Part C also included a third level of response that related together all of spread, clusters, and measures of center (Code 3). Examples are provided in the Results. These outcomes addressed the final research question on the Post-activity Assessment. Based on the first 20 sets of responses in the spreadsheet, the first author developed rubrics for the Workbook questions. All responses were then coded by an experienced researcher. Because there were four different sets of class data (Appendix A) and each pair of students collected four random samples from the CensusAtSchool site, coding for each response was uniquely based on the data sets analyzed by that student. All TinkerPlots representations were available to the coder (see Fig. 5 in the Results). For the Assessment, a similar process was used based on scans of the students’ work. In this case, the rubrics for some of the extended questions Table 1 Workbook questions and associated Research Questions. Research Question

Associated Workbook questions

Preliminary question: What is the nature of understanding of measures of center that students bring to the investigation? Main part of activity: (a) Decision for class Main part of activity: (b) Decision for random samples Main part of activity: (c) Decision for CensusAtSchool “population”

Appendix Appendix Appendix Appendix

7

C: C: C: C:

Q2 to Q5 Q6 to Q9 Q10 to Q21 Q22, Q23

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Table 2 Levels of understanding of measures of center at the beginning of the activity. Code

Mode

Median

Mean

Hat Plot

1 – weak, single part definition 2 – strong, multipart definition 3 – definition related to strong contextual example Total

7% 29% 55% 92%

16% 62% 8% 87%

19% 47% 22% 88%

19% 49% 25% 93%

were amended on consultation with the first author and subsequent coding agreed by the two. The Cronbach’s alpha for the 23 Assessment items as coded was 0.830. For both the Workbook and Assessment, scores were totalled to gain a picture of overall capacity to handle comparison of two groups and the recall from the experience. The correlation of these scores gave an indication of the success of the activity in meeting the researchers’ expectations. 4. Results The Results are presented in the order of the Research questions. 4.1. Preliminary question For the questions at the beginning of the Workbook to monitor student procedural and conceptual understanding of measures of center at the beginning of the investigation, almost all students had at least a basic procedural descriptor for the measure. The percentages for the levels are shown in Table 2, with the mode being the easiest for students to give an example in context. For the other measures, more than half of the students could provide an appropriate definition but not a meaningful context for an example. Examples of Code 3 responses including both a strong definition related to a context for an example included the following. Mode: The value that occurs the most. An example for ‘mode' in real life might be that there is an ice-cream day and you want to know what ice-cream is most popular. [ID70] Median: The middle value (after all the values have been ordered in ascending/descending order). Example: We could use the median when getting the idea of around how much the houses in an area cost. [ID134] Mean: The mean is the average. The mean can be found when all the values are added together and divided by the number of values. The mean can be used when finding the average of test results. [ID37] Hat Plot: A plot where a ‘hat’ is placed over. The crown represents 50% and the brims are 25%. It can be used to get a range of typical values on reaction time. [ID99] In answer to the Preliminary research question, the total percentages in Table 4 gave confidence that the students possessed the tools to carry out the analysis for comparing two groups. At least 87% had a basic appreciation of all four measures of center, whereas at least 69% had a strong definition if not a meaningful contextual example. Fig. 4 shows the distribution of total scores out of a possible score of 12 for the students. The distribution was skewed to the left with a median of 8 and 79% of the total scores were at least 7 out of 12. 4.2. Main part of the activity (a): decision based on class data Based on the data from their class, students answered the questions in Fig. 2. For Q6, 98% of responses agreed with a reasonable interpretation of the data shown in Appendix A (Class 6B − Yes, Brown; Class 6C − No, Other; 6G − No, or difficult to tell; 6S Yes, Brown). The students were then asked what tools or strategies they had used to make their decisions (Q7). Only one student did not use a tool or strategy. Examples are shown in Table 3, reflecting movement from a more procedural approach to showing an ability to combine procedural and conceptual elements.

Fig. 4. Scores for the preliminary Workbook questions on measures of center.

8

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Fig. 5. A TinkerPlots file after four random samples were collected and plotted.

Table 3 Examples of responses for Question Q7. Code

Example responses

0 – no tool or strategy 1%

The fastest child in the class has the eyes, the slowest child had brown eyes. Both are female but normally the average female has faster finger movements than the average male according to scientists. [ID54] We used a hat plot to help us to find out how we reached the conclusion that brown eyed students in year six have a faster reaction time. [ID38] I used the mean to discover the average of the plot and also applied a ‘Vertical Reference Line’ to discover the exact location of the mean. [ID107] We could tell from the variation of the graphs used. We used range, shape and outliers. We didn't use tools. [ID55] We reached this conclusion by positioning the mean, mode, median and a hat plot on the scale. We compared these numbers and found they were located on the right side for brown-eyed people. This is closer than ‘other’ people. [ID105] To reach the conclusion, we used the hat plot, because using the hat plot, we know 50% of the reaction time. Therefore, we could compare the two 50% and see which eye colored students have faster reactions. [ID43] Although the brown eyes are more spread the blue eyes have more in a better time. We used the mean and hat plot. The blue eye mean was at a faster time. [ID35]

1 – used at least one tool or strategy

42% 2 – went on to describe how tool or strategy used 56%

Ninety-five percent of students answered Q8 in the negative about the transfer of their decision to all Grade 6 students. Half gave general reasons, whereas half focused on sampling issues. For the question on being more certain (Q9), the majority of responses focused on sampling with fewer further suggesting random samples. Examples for Q8 and Q9 are given in Table 4. For a possible maximum score of 7, the range for students’ scores was 3–7, with a median of 5. The non-zero rubrics for Q7 to Q9 recognized correct decisions but gave more credit for extended or more statistically/conceptually appropriate justifications (see Appendix C). These are exemplified in the quotes provided. Of the eight students scoring a total of three for the four questions, only one did not score on two of the questions. Hence 91% of the students showed a basic appreciation of the variation between the two groups, whereas 82% could suggest data-driven issues to increase their certainty in their decisions. After students completed their analyses and answered the questions in the first part of the Workbook, class discussion expanded the repertoire of student responses, illustrating the wide range of tools that students had at their disposal. Pairs of students reported to the class on the TinkerPlots tools they used to come to a decision. The extract presented here is from Class 6G (Appendix A), which had data with the closest measures of center and hence was the class acknowledging the most difficulties and uncertainties in the process of analysis.

9

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Table 4 Examples of responses for Questions Q8 and Q9. Code

Examples Q8: true for all Grade 6?

0 – agree true for all Grade 6 2% 1 – general reason, not true for all Grade 6 49%

I think it would be the same because we have the most brown eyes in our school. [ID54] Absolutely no, because we all think, act and behave differently. [ID33] It would be varied for other year 6 students. [ID55] No, because this is just a sample of one class in one region. [ID58] No, because we all think differently. We’re also only a little sample of all year six students. [ID68] Examples Q9: how to be more certain? Use a hat plot to be more certain of our conclusion. [ID48] We could recheck our results to be more certain of our conclusion. [ID74] I could collect more data from more people to be more sure of my conclusion. [ID66] Even the amount [sic] with brown eyes with other colored eyes. [ID78] I would look at other random samples and see whether this theory applies to them. [ID75] We could collect more random samples from all across Australia. [ID84]

2 – sampling issue, not true for all Grade 6 49%

0 – extraneous issues 18% 1 – collect more or different data 65% 2 – collect random sample 18%

Pair 1a: We used the Reference Tool. [Line] Teacher1: What did you do with it? Pair 1a: Well, we were actually seeing if our eyes were deceiving us, um we were just seeing which one was actually further by using it. We lined it up with one of them, and just looked where it matched up with the other ones. Teacher1: How did that help you to make your decision? Pair 1a: Well, it showed us which one was like which … Some of them were quite close together … Pair 1b: They weren’t close together they were just um, the Other was quite far away from the line … We decided that, no, Brown doesn’t have faster reaction time than Other. Teacher1: Okay, doesn’t. But is the Other one faster? Pair 1b: Some. The teacher then turned to another group, which had done something quite different, considering many features of the plot and unable to reach a decision. Teacher1: Okay, Pair 2a and Pair 2b? Pair 2a: Well, we didn’t use any tools … We looked at it … And because it looked pretty straight forward … Pair 2b: Yeah. Pair 2a: So we, we looked at the spread and the shape and the outliers … Pair 2b: And the range … Pair 2a: And everything … Pair 2b: And the outliers … Pair 2a: And we came to the conclusion that some Brown eyed people do [have faster times] although the Other colored eyes have some better or worse. Teacher1: Right, so overall you couldn’t really come to a final decision? Pair 2b: Na. Teacher1: On Brown or Other? Pair 2b: Not really. Pair 2a: Cause there was an outlier in the Other ones but some of the Blue eyes had faster times. The teacher then asked if other pairs had used tools and found the issue of consistency arose. Pair 3a: We used a Hat Plot. Teacher1: Okay, what was interesting about those Hat Plots? Pair 3a: Well one of them was not really a Hat, it was like a … (uses her hands to show narrow shape).

10

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Pair 3b: Top Hat! Teacher1: A Top Hat, okay, it was the tiniest Crown I have ever seen. So there you are, very unusual, okay. What did that Hat tell you? Pair 3a: Well, it’s like really clustered together and the other one is kind of spread out more. Teacher1: And what did that tell you about who was faster? Pair 3a: We thought the Others were kind of faster because they were like, they were more together and they were easier to tell [close] together. Teacher1: Right, okay, a little more consistent, alright anybody else use another tool? Pair 4a: We used a Hat Plot and the Mean … We did 0.478889 minus 0.475714 to work out the difference between the two Means. Teacher1: Okay, and tell me what the difference was? Pair 4a: The difference was 0.003175. Teacher1: Is that very much difference? Pair 4a: No, not much. Teacher1: Okay, I don’t think we could have actually distinguished that, do you? … So what was your conclusion? Pair 4a: … The Brown eyes do not have a faster reaction time than Other. Specific discussion of the language of the two types of variation, within groups and between groups, was not highlighted by the teacher. As can be seen in the above transcript, however, students intuitively referred to either or both when making their decisions. The discussion then moved to the certainty with which students had made their decisions. The students offered various reasons and the teacher led them to think more about sampling. Teacher1: Now, how certain were you of what you decided? S1: Not very. Pair 1a: Uncertain. S2: You can’t really be definitely certain because the scientists, they, and they’re scientists, they came up with a totally different answer to what [we] got and so, and we used math and all of that and they used science. Pair 1a: For some people it wouldn’t really be about genetics because some people train and stuff. Teacher1: That’s true too but what about, what about what we did here and our confidence based on our group? What can we say about our group? … what about our sample? S3: We didn’t have the same number of each eye color. Teacher1: Okay, that wasn’t exactly even was it but what else about the size of our group? S2: It was quite small … S4: I think we aren’t exactly the best representatives because we might be some people that, we might be rubbish compared to other people cause we have no idea what … S5: Let’s be honest we don’t have exactly the fastest time reaction. Teacher1: What would make a sample much better than our sample? Pair 4a? Pair 4a: Get samples from, get samples from every school in Australia? The class discussion then turned to sampling internationally and the difficulties for children who had not seen a computer or mouse. One student then returned the discussion to Australian states and suggested the following. S5: [the] Education Minister or whatever, they have to send out a notice to all teachers in Year 6 in every school in each State, then they get to do the reaction time and they send it back to the Education Minister and they send it to whoever … S6: They can send it to us! The teacher then moved the discussion to the CensusAtSchool site to be used for the rest of the activity, referring to the previous year’s activity that had used the site for different attributes. Teacher1: … so what kind of sample are we going to collect from the CensusAtSchool that’s going to be different from collecting just your Grade 6 at [your school]? S7: Random. 11

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Teacher1: And how is a random sample different from this class? Pair 2b: Just random people. Teacher1: Yes, but what does the “random people” mean? S5: Just like they’re picked out from anywhere like they’re picked out from any Year 6 school. S4: Big hat … A big hat with everyone’s names in it. The discussion that took place in class 6C after students had completed entries in their Workbooks was different to class 6G due to the different data for the class (Appendix A), which had an imbalance of sample size for eye color and difference in the distribution of reaction times. Teacher2: Okay, do Brown-eyed students in your Year 6 class have faster reaction times than Other students in your class? S1: (shakes his head) S2: No. S3: No! S4: Noooooo. Teacher2: Okay, what sort of tools did you use? S5? S5: I used the Median. Teacher2: … S6 what did you use? S6: Um, I just looked at the graph. Teacher2: Well, what did you look at, S6? S7: This Hat Plot is closer to the edge than this Hat Plot. Teacher2: Okay, so S7 used a Hat Plot, okay… S6: That is more further and these are more closer, closer together, closer means faster reaction time. Teacher2: Okay, what um, if we look at our data what conclusion do you reach? S8? S8: Um, Brown eyed students have um, um a slower reaction time than Other-eye-colored students. Teacher2: Okay and how certain are you of your answer? S8: I am um, not that certain because the, the amount [sic] of Brown-eyed students um compared to Other-eye-colored students, um, um, is very big compared with each other so if they were both the same number and we got, this um, we got a similar result I’d be more certain. Teacher2: You’d be more certain then so S9, what were you saying we could do? S9: Um, we could um gather random samples of eye colors and reaction times and compare them to ours so we could be more certain. Teacher2: Alright, tick! Good response. …. At this point there is a rather lengthy discussion of the calculations for the means of the two data sets, the decimal values, and how much of a difference would be meaningful. The teacher then tries to return the focus to the suggestion of random sampling. Teacher2: Okay, so, for us to feel more confident with our decision or our conclusion we could do what? If we wanted to feel more confident with our conclusion, what could we do? S10: I’m already confident. S2: Test more times. Teacher2: Test more times? What do you mean, test each of you? Have another go? S3: Yeah. S5: Oh, maybe collect more data from other students. S4: Yeah, that would work. S1: Make sure it wasn’t a fluke.

12

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

S8: Yeah, let’s do that. Teacher2: Collect more data from other students? S1: Yeah, that would probably work better. S5: Yeah, like Brown-eyed or Other, like … Teacher2: What data are you collecting on? Their eye color or their reaction time? S5: Reaction Time. Teacher2: Okay, what other students? S5: Like … um, other Year 6 students. Teacher2: Other Year 6 students, S11 is that what you were going to say? S11: Other classes. Teacher2: Other classes? S11: Results from [our school]. The teacher then needs to be specific and move on to introduce the random sampling from the CensusAtSchool site. The following passage shows the difficulty students have in articulating what they understand about random samples. Teacher2: Okay, alright, so we are, like S9 suggested, going to do a random sample, I’m sure many of you thought that as well, what is a random sample? S4: A sample. S12: A random sample. Teacher2: S13, what’s a random sample? S13: Um, it’s like, it’s like you don’t know about the data it’s just um … S1: (whispers) Random. S4: (whispers) It’s random. Teacher2: Do you want to give her a hand S14? S14: Um, it’s um, it’s where, like, thing, ah, I forgot … S13: Data that you don’t really know about it, it’s just … random … randomly … Teacher2: Okay, good … Why is it better to use a random sample than just use our class to suggest what’s typical for all Year 6 students, S15? S15: Well our class could be um, particular outliers, it could affect your conclusion. Teacher2: It could. S5? S5: Um, like um, everybody’s different, like, um, the data could be different. Teacher2: Yes. S13? S13: To see um, to see if they have the same results. Teacher2: To see if they have the same results. In this class the students’ lack of ability to articulate their understanding of random made it more difficult for the teacher to move on to the collection of random samples.

4.3. Main part of the activity (b): decision based on random samples As they were completing the sampling in pairs from the CensusAtSchool “population”, students were asked to describe individually the four random samples collected (Q10, Q12, Q14, Q16). Fig. 5 shows the TinkerPlots file created by one pair of students. Every pair considered a different set of four random samples for responses to Q10 to Q17. The reasons for decisions across the four samples were quite consistent for individual students. Overall 10% of responses did not match the plots saved in the TinkerPlots file, 78% gave a brief report on the observations from the plots (Code 1), and 12% gave extended detailed descriptions implicitly noting variation (Code 2).

13

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Code 1: The others have a faster reaction time. [ID03] The fastest outlier is a brown eye male with a time of 0.24. [ID24] Code 2: The other colors are faster. The hat was more shifted to the right. [ID59] The mean for brown is faster. The fastest for brown is way faster than the other. The cluster on brown is further down left which means it [is] faster. [ID148] When asked about tools and strategies used, again the results were coded by comparing the claim with the evidence in the saved plot (Q11, Q13, Q15, Q17). Overall, 9.5% of claims did not fit what was seen on the plot, 38.5% of responses noted using one or two tools that could be seen on the plot or a strategy consistent with what appeared on the plot, whereas 52% noted three or more consistent tools/strategies. Of more interest were responses about what surprised the students across the four random samples (Q18). Eighty-six percent of responses (Code 2) fit some or all of the four plots, with either “yes” and “no” answers acceptable if justified in the light of the TinkerPlots files. Examples included the following. Code 2: Yes. I am surprised about the results because I have proved the scientist[s] were wrong. [ID35] No. Because we knew that all the data would vary so we weren’t surprised. [ID10] [No] I thought that it was 50/50 chance and our results were 2 brown and 2 other so I was right. [ID31] No, because all the samples are generally similar and don’t usually include outliers. [ID61] Students were then asked specifically to make a new decision about Brown eyes and reaction time based on their four random samples (Q19). Fifty-three percent of responses were specifically reasoned based on the saved files (Code 2) but 42% did not provide concrete evidence (Code 1). Code 1: Brown eyed Year 6 students don’t have faster reaction times. This is because our studies of random samples prove that eye color isn’t related to reaction time. [ID58] Mostly yes, the data set shows us so. [ID67] Code 2: Brown eyes and other colored eyes are the same. The reason is because first they were same, then other, then brown, colored eyes, then brown. Although brown has majority they were still very similar. [ID66] Yes, because the mean, mode and hat plots support the statement. The hat plots represent the cluster which on the brown side is further down the left meaning it’s faster. The mean which is the value representing a normal, untrained person was usually faster. [ID148] The next question, Q20, reinforcing the idea of considering the decision for all Grade 6 students, provided an interesting challenge for the students given their limited background. Some still replied “No” because of the small size of their random samples or because everyone is different (Code 1). Others said “Yes” reasoning that the samples had been random and from all over Australia (Code 2). A few students related their conclusion to the wider context beyond eye color determining reaction time (Code 3). Code 3: No, because this is only some evidence, we can't be sure without taking various data. [ID3] No because everyone is different and it depends on more than eye color. [ID78] When asked what they could do to be more certain (Q21), 39% were happy with their conclusion and needed no more information, or would not be happy until every Grade 6 in Australia was tested (Code 1). Sixty-one percent suggested taking more random samples and/or increasing the sample size (Code 2). The maximum score possible for the questions and conclusion about the four random samples was 25, and the range of students’ scores was 8–21, with a median of 16. The distribution, shown in Fig. 6, is only slightly skewed to the left. Seventy-nine percent of students scored 14 or more overall with the rest basically able to describe the inconsistency across the samples and only two students struggling with scores less than 10.

Fig. 6. Scores for the Workbook questions on decisions for the random data.

14

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

4.4. Main part of the activity (c): decision based on “population” Using the 1786 reaction times in the ABS “population” in TinkerPlots (see Fig. 3) and comparing Brown eyes with Others, students were asked for a final decision for all Grade 6 students in Australia (Q22). When examining the plot, the median and mode were both 0.4 and Other had the lowest value in the plot. The ranges for the two data sets were (0.12, 0.98) for Other eye colors and (0.15, 0.98) for Brown eyes, and intervals for the crowns of the hat plots were (0.34, 0.46) for Other eye colors and (0.35, 0.46) for Brown eyes. The difference in means was only 0.003 s, favoring Brown eyes, but this led many students to conclude that the Brown-eyed Grade 6 students did indeed have faster reaction times. Some students used the smallest value in the plot and the wider crown of the hat plot to conclude that Other-colored eyes were faster. A few students were misled by the fact that there were more students with Othercolored eyes in the ABS “population”. This was used as the reason why other children had faster reaction times. Students who did not conclude that the two groups of students had the same reaction were not rewarded if they claimed strong certainty in the final question. Only 36% of students concluded that there was no difference in reaction time between the two groups. When asked about their certainty in their conclusion, only 68% of those who concluded no difference expressed certainty in their conclusion. The following are examples, some of which considered the variation or lack of it in the distributions.

• Decision: No, they are around the same times so the brown eyes are the same. We used hat plot to see where the most is. I think • • •

that this is because you don’t see differently with different colored eyes. Certainty: I feel more certain now even though it is not all the Grade 6 students in Australia. [ID08] Decision: I think that overall brown eyed students and other eye color students are equal in the end. I am looking at my third plot and overall and out of 1786 year 6 students it looks like it is all even! All of the dots are clumped up in the same area and they are spread in the same areas of the plot too! Certainty: I am 65% certain that this data would be just like a year sixer in Australia and close to be just like a year sixer! [ID77] Decision: No I don’t think so. They have the same mean, mode, median and hat plot. You can only have this situation if they have very similar features or the same, so others and Brown are very similar or the same. Certainty: I felt certain because the data is true and the data is very wide ranged, of 1786, all around Australia. [ID146] Decision: No, I would say they are pritty [pretty] even because the fastes[t] times from other and brown are very close to each other. The slowest ones are also very close to each other. Certainty: I am very cearten [certain] because there is a very very very large sample now. [ID83]

This final generalization for the “population” was the most difficult for students to make. For the final two questions, only 26% of students scored the maximum of 4, whereas 6% did not score. Combining all of the Workbook sections, the distribution of scores ranging from 18 to 39 out of a possible 48 is shown in Fig. 7, with a median of 31. Seventy-five percent of students scored 28 or more and 94% scored half or more of the total possible. For the bottom quartile of students, the issue was not a complete lack of appreciation of the activity but inconsistency across different subtasks. 4.5. Post-activity assessment Because the students had now been using TinkerPlots for nearly two years in the investigations in the project, it was of interest to document their understanding of the tools of statistical analysis as they are employed using the software. Hence Part A of the Assessment (see Appendix B) was mostly procedural, focusing on recognizing the format in which the tools appeared. Many of the initial items required only short answers and the percentages correct for them and Item C1a, are shown in Table 5. The last three items in Part A gave the opportunity to show some aspects of conceptual understanding and were assessed with multiple levels. In all three cases, Code 1 reflected a procedural response or one with limited use of either the graph or the context. Code 2 responses demonstrated an appreciation of both the representation and some aspect of context. As seen in Table 6, it was more

Fig. 7. Total Workbook scores for all parts of the activity.

15

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Table 5 Assessment results by item for short questions (Appendix B). Item A1 A2 A3 A4a A4b A5 A6 A7 A8 A9a C1a

Percent Correct Name one attribute Give its value Name attribute on plot Scale intervals Begin/end of scale Range Value occurring most often Name of value Place X on plot for case 14 Which larger: mean/median Which plot most variation

98% 82% 88% 70% 75% 58% 83% 61% 66% 59% 84%

Table 6 Example responses for Assessment items A9b, A10, and A11. A9b Explain which of mean or median will be larger Code 1 34% Code 2 24%

I chose mean because the mean is all the data and the median is the other stuff. [ID44] Because the median is found by the numbers being in ascending order but the mean is found by adding them all up. [ID78] The mean would be higher considering there is a few outliers, which especially when finding the average will extend the mean. [ID48] I think the mean would be larger because you only need to find the average by counting the values and dividing. However the median is finding the middle and there are more values clustered on the left than the right making the median more to the left. [ID60]

A10 What the shape tells you Code 1 60% The shape of the plot is clustered up the left end (lower values). [ID66] This tells us most students don’t spend so much time on the phone per week. [ID66] Code 2 22% The plot shows us that most people spend very little time on their phones. There is a very big bump on the plot within 0–4 h. [ID70] After 4 the other values vary, 102 students do not eather [either] have a phone or just don’t go on it that much. [ID91] A11 Ignore some data? Why or why not? Code 1 63% “No.” I think all of this data is very important. [ID74] “No.” I don’t think that any of the data sets should be ignored because they help found out a mean, median, and mode. [ID76] Code 2 25% “No.” Because it tells about the students has phone or not and how many hours they are with phones. [ID132] “No.” Because the data set helps us gain knowledge of how many hours people spend their time on the phones. [ID10] “Yes.” Because most of the values are piled up at using the phone for one hour each week and the only reason to check those values would be for checking their gender or checking how many hours they do their homework each week. [ID54]

difficult for students to give conceptually-based answers to these questions. Part B showed two pairs of plots and asked for three reasons why the proposed hypotheses would be supported. For the plots in Items B1 and B2, Code 1 reasons why students walking would get to school sooner included procedural data reading without comparison or mention of size of the difference, whereas Code 2 responses included comparisons. Item B3 on whether a sample of 15 “walk” and 15 “car” from their school would be similar or different to the plots shown, was conceptually much more difficult for students with acknowledgement of uncertainty as well as a statistical justification required for a Code 3. Code 1 responses either had a statistical justification but no uncertainty or a personal reason including lack of certainty. A contextual justification with uncertainty was coded 2, whereas uncertainty with a statistical reason was coded 3. Examples are given in Table 7. For Part C, although most students could identify Bus as having the greatest variation (C1a) (Table 5), a complete explanation of why, including all three plots, was more difficult (C1b). Code 1 responses tended to make comments about the bus data without reference to the other two plots. Code 2 responses included comparison, whereas Code 3 responses went further, describing variation in terms of spread and clusters, as well as relevant statistics. Examples are given in Table 8. Combining scores on all three parts of the Assessment, the totals are shown in Fig. 8. With a total possible score of 36, some students did very well, whereas 76% scored at least half marks (18). Fifty percent of students scored 24 or more. Although both teachers and researchers were disappointed in some scores, of the ten scores of 11 or under, only two were ESL students. 4.6. Knowledge recall after two weeks The relationship between the scores on the Workbook questions and the Assessment indicated that the retention of understanding was not strong. The correlation between the Grand Total for the Workbook and the Total for the Assessment was 0.271 (p < 0.01),

16

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Table 7 Example responses for Assessment items B1, B2, and B3. B1 Reason walk gets to school sooner than bus Code 1 31%

Code 2 42%

There is a cluster around 0–5 min. for walking. This shows that walkers get to school after 0–5 min usually. [ID133] People who go by bus are mostly spread around the middle to end. [ID135] The mean is larger for the people who travel by bus. [ID52] The highest outlier for taking the bus is 75 and the highest for walking is 60. [ID9] The average amount (sic) of students get to school by walking at 4 min to 15 min, and the average time for the bus is 20 min to 45 min. [ID9] The quickest time for the bus is 8 min, while the quickest for walking is around 2 min. [ID9]

B2 Reason no difference between walk and car Code 1 41.3% Plot shape is very similar. [ID15] The averages are about the same. [ID150] Code 2 31.3% Theo is correct because most of the data is clustered near the beggining [beginning] for both travel methods; car and walking. This means that it takes a shorter period of time. [ID43] By looking at the averages, we can see that they are very similar, but we know that cars have a faster time than walking by about 2 min. [ID43] The hat plot tells us that most of the data ranges from about 5–15 min for both methods. Therefore we can infer that the two methods have not much of a difference. [ID43] B3 Sample from your school, same or different? Code 1 41% It would look diff[e]rent as their [there] would be less people making my graph more diverse. [ID38] Different, because we have people who live so far away it takes quite a while to get here. [ID8] The data would look different because there are different people and different life style. [ID31] Code 2 31% I think it would look very similar as most people live in the same area. [ID15] Yes, I do think that our plot would be similar, because the people who walk to school generally know that they won’t be late just by walking. The people who live far away are scared of being late, so they drive to school. [ID59] I think would look different, I think that most students would go by car because there is a very steep up hill that you need to get over to go to school and most of the people I know in school go by car. [ID60] Code 3 19% I think that it would look very different, because in total there are 30 students from our school who have been selected, but this plot has selected a random sample of 120 students. [ID134] You can’t tell because there are more than 900 students at my school and the 15 students could be anyone. [ID78] I think that they would be similar, but less than Theo’s plots because of their sample size. If there was the same amount [sic] of values then the median, mean, and mode would be similar, like in Theo’s plots. [ID100]

Table 8 Example responses for Assessment item C1b. Explain which plot shows the most variation Code 1 31% Code 2 28% Code 3 14%

Bus. Because the bus has more spread out times ac[c]ording to the evedincy [evidence]. [ID5] Bus. I think this because there are lots of different times on how long students can get to school. [ID127] Bus. Because walking and car have around the same times but bus is a lot more spread out. [ID8] Bus. Walk and car are both clustered at one end whereas bus is spread along the whole plot from 8 all the way to 75. [ID151] Bus. The range for walk is 1–60 mins, car is 1–45 mins and bus is 8–74 mins and also the hat plot and median is towards the middle of the scale instead of the front half. [ID24] Bus. The bus plot has a very widely spread data. We know this because the hat plot ranges from about 20–45 min, unlike for the other two plots, which are very clustered near the beginning of the plot. [ID43] Bus. The walk and car plots are very similar but the bus plot is very different and the hat plot evidently shows this. The middle 50% for walk and car are on the left-side of the scale, whereas, the bus plots mid 50% is in the center. This evidently shows that the bus plot varies the most. [ID105]

Fig. 8. Total scores on the post-activity Assessment.

17

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

representing only 7.3% of the variation in Assessment scores explained by the Workbook scores. The best predictor of the total Assessment score was the total on the four questions about Measures at the beginning of the Workbook (Preliminary research question). The correlation of scores was 0.369 (p < 0.001), or 13.6% of the variance explained in the Assessment by the questions on the basic Measures, whereas between the rest of the Workbook questions combined (Main part of the activity research question) and the Assessment, the correlation was only 0.079 or less than 1% of variance explained. The application of the conceptual knowledge gained from the activity to the new contexts in the Assessment was more difficult than the application of procedural knowledge. 5. Discussion 5.1. Summary The activity reported here was the sixth of seven major investigations that were part of a 3-year project introducing students, starting in Grade 4, to beginning inference. The purpose of the project was to build the intuitions and conceptual understanding that would lay the foundation for later more formal work in statistics and provide a foundation for critical statistical literacy. Given the experience the students had gained in the previous investigations, it was appropriate to challenge their thinking both by extending the context of reaction time to comparing two groups and by motivating the activity with on a questionable claim from the media. No previous study had been found following student learning in a similar activity for elementary school students. The use of random samples from the TinkerPlots Sampler was intended to emphasize two features of random samples: the same chance for each person in the population to be chosen and the variation that occurs in outcomes for different random samples of the same size. The purpose was similar to that of Ben-Zvi (2006) in his work “growing samples”, where students enlarge their original data set several times to increase sample size. The environment and time constraints in the school did not allow the expanding of sample size within the school and the approach taking multiple random samples the same size of the class was considered a valid alternative in terms of decisionmaking. The questions in the students’ Workbooks were structured around the four steps of the practice of statistics because it was felt the students were not ready to write reports on their own. The students accepted that the ABS “population” of 1786 Grade 6 students was very large compared to their class and their random samples but small compared to the approximately 280,000 Grade 6 students in Australia. Throughout the activity there had been no definitive statement from the teachers about “statisticians accepting random sample sizes of 1500” as large enough to make decisions with a relatively high degree of certainty. Students made decisions one way or the other: some saying 1786 was large and others saying it was small. This led to different decisions on certainty, one of the main reasons for lower code scores on the final question in the Workbook. Up until the final questions most responses were appropriate, with the codes (1 or 2) mainly reflecting the relationship of the procedural and conceptual reasoning shown (Hiebert & Carpenter, 1992). Overall it was judged that most of the students showed the capacity to take on and appreciate a statistical investigation in the context of comparing two groups. Although students at many times discussed the variation between the groups or within the groups, they rarely used the word “variation.” At this grade level the teachers were not asked to stress formally the terms “between” and “within.” It was disappointing that some students did not score as well on Parts B or C of the Assessment that were based on TinkerPlots graphs. Although students had filled in their Workbooks during class individually, they worked in pairs using TinkerPlots. It may have happened that one of the pair took over the manipulation on the computer screen, with the other not paying attention, hence recalling less of the activity. It was expected that the contexts of travelling to school presented in Parts B and C would be familiar to students and it appeared from the responses that they were. The students, however, had never seen three graphs together and been asked to compare them in pairs or all three. It could be that the cognitive load of the comparisons became too much for some of the students. The explanations for decision-making on the Assessment were again varied in their conceptual sophistication and extent but still showed basic procedural understanding of the tools being used and the task of comparing two (or more) groups. Some of the responses were very encouraging in their levels of understanding. The teachers used the results in writing their students’ reports and generally were pleased with the capacity of their students to carry out the activity. Overall this study contributes to three areas of statistics education research. It adds to the growing focus on classroom interventions to assess the capacity of elementary students to understand the practice of statistics and decision-making with uncertainty (e.g., Ben-Zvi et al., 2012; Makar, 2014; Zieffler & Fry, 2015). It also continues the consideration of procedural and conceptual understanding, long of interest in other parts of the mathematics curriculum (Baroody et al., 2007; Hiebert & Grouws, 2007). Third, in introducing the comparison of two groups, it continues the interest of Cobb (1999), Makar et al. (2011), and Allmond and Makar (2014) in this topic at the school level, and recent innovations at the tertiary level (e.g., delMas et al., 2014; Frischemeier, 2014). Moving to comparing groups adds more conceptual complexity to the investigation but the motivation for supporting or refuting a conjecture is high.

18

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

The strongest correlation was between the responses given by the students at the beginning of the activity on their understanding of the four measures of center and the Assessment. Students had been working with these tools since the middle of the previous year, and hence this mainly procedural understanding was more readily applied than the conceptually-based extension of the practice of statistics to two groups, which they had only encountered two weeks previously. The statistical significance (p < 0.01) of the correlation between the total scores on the Workbook and the Assessment is encouraging, but this is mitigated to some degree by the low percentage of variance explained. Groth and Bergner (2006) and Groth (2014) did not have data to consider the question of retention for their preservice teachers but it would be of interest to design a study that would consider the retention of the procedures and concepts at the end of preservice teacher courses on statistics education. This study is, however, a promising start, with an indication that some students did very well and all appeared to enjoy the challenge of the investigation. 5.2. Critical statistical literacy The choice of the topic for the students’ investigation grew from the first author’s association with a Grade 10 teacher who found an article in the local newspaper and based a classroom activity on it for his students (Watson, 2008). It was felt that the anecdotal evidence from that lesson should be followed up with evidence on students’ capacities to engage individually with the concepts related to both statistical literacy and the practice of statistics. Having an authentic report provided a powerful opportunity to do this. It was felt that the students in the project by Grade 6 had the three fundamental requirements of statistical literacy to think critically about the claim in the article concerning eye color and reaction time (Watson, 2006). First, as shown in the questions at the start of the activity, most were familiar with the tools required; and second, as the activity progressed, they showed that they could use the tools in the context chosen. In two of the classes (6C and 6S) students were aware of and commented on the unequal sample sizes for their class data in relation to eye color. Third, they appreciated the need to assess critically the claim in the article using their evidence, although some struggled with which evidence to use. In one class a student asked if this were a “real” newspaper article or a set-up question to give them something to investigate given a previous activity they had completed. The presentation in Fig. 1 was genuine and they were convinced, but then asked questions about how the researchers had done their study. Most students were skeptical about the truth of the claim and hence happy to investigate it. At one point a pair of boys waved their hands and one said, “But I’ve got brown eyes and his are blue, but he had a faster time on the computer!” The teacher said, “Ah, a counter-example!” A girl in a pair immediately replied, “But I’ve got a counter–counter example. I’ve got brown eyes and she has blue AND I’m faster!” There was humor around the classroom as pairs compared their eye colors and times. This anecdote supports the need to use authentic contexts, even with primary school students, who today do not want to waste time on fictional data. Two things were disappointing about some of the final conclusions of students based on the large ABS data set. First a few students still claimed brown-eyed students were faster because of the 0.003 s difference in means or claimed others were faster due to the fact that the fastest time was by a child with other-colored eyes. It is possible that using the difference of 0.003 s to claim a contextual difference may be related to children’s appreciation from learning about subtraction elsewhere in mathematics: “a difference is a difference” and a correct calculation means something wherever it is found. Including context in the mix increases the conceptual complexity of thinking. Although these students had the tool needed for the task (first requirement for statistical literacy) they had not managed to place it meaningfully in the context to suggest whether such a difference would be observable or not. In all classes the teachers discussed this and asked questions about detecting the difference, either at times during the activity or in the wrap up at the end. Picking just the lowest value in the data set to make a decision is reminiscent of the observations of Konold, Pollatsek, Well, and Gagnon (1997) that students often focus on individual comparisons rather than the comparison of aggregates. Second, some students continued to want a larger sample or the whole population before they would become more certain of the sameness of the reaction times. It seems clear that much experience is required in building intuitions about sample size and the confidence one has in making a decision in an investigation. This was still an issue with Grade 9 students in a study by Bill, Henderson, and Penman (2010). Tertiary courses, for example, introduce t-tests and p-values that depend on n, the sample size. Even then, however, the interpretation of the p-value is not a trivial task (Reaburn, 2014). It will be interesting to follow future research on this aspect of the practice of statistics. 5.3. Limitations Whenever an authentic activity such as this is carried out in a classroom, it is not possible to control the number of students with brown or non-brown eyes. In this study there were two classes with small numbers of non-brown-eyed students. In other classes in different parts of the country the imbalance would be in the other direction. Part of appreciating the issues of sampling in statistics is to experience such imbalance. The inclusion of the ABS CensusAtSchool data in the study was intended to compensate for this to some extent. Even there, of course, the random samples students collected were sometimes imbalanced.

19

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

For students just beginning to make inferences based on samples it is likely that it would have been better to select the same number of students with Brown-eyes as Other-colored-eyes for the ABS “population.” The complication of there being more Othercolor-eyed students would not have interfered with the decision-making. Most activities introducing older students to comparing groups begin with contexts where there is expected to be a difference in the two groups (e.g., Shaughnessy, Chance, & Kranendonk, 2009). The authentic context chosen for this study led to the outcome of “no difference.” This result is undoubtedly more difficult to tie to the language of variation for young students because there is little variation overall between the groups. Perhaps an interim activity based on a situation of a convincing difference would have been a better staring point to focus on the two types of variation. Time constraints did not allow for this to happen. An activity with a large difference between means of the two groups, however, would not have exposed the difficulties some students had with the relative sizes of numbers. Such contexts also pose the difficulty of students “knowing” there is going to be a difference before they start. 6. Conclusion The students in this study were entering the third year of a project with the aim of developing statistical literacy through the understanding of beginning inference. They were reinforcing their experience with the practice of statistics (Franklin et al., 2007), in a meaningful context (Rao, 1975) of comparing two data sets, encountering consideration of potential variation of two types (Moore, 1990) and requiring an evidence-based generalization acknowledging their levels of certainty (Makar & Rubin, 2009). The authenticity of the question provided a link to the goals of statistical literacy at the school level (Watson, 2006) and the further goal of Gal (2002) to be able to communicate concerns about claims made, including evidence. The progress made by the students gives encouragement that intuitions can be built at the school level that will underpin future formal engagement with statistics. Although continuing reinforcement is required, the students appear to have made a reasonable and motivating beginning. As students progress through middle school and high school, they can be introduced to many meaningful authentic contexts and more tools that will continue to reinforce their understanding of the practice of statistics and its usefulness for becoming critical thinkers when meeting statistical claims outside of the classroom. As their experience expands, especially in relation to the use of evidence to provide a degree of certainty in decision making, they should be better prepared to take advantage of recent trends in teaching statistics at the tertiary level (e.g., Cobb, 2007; Garfield and Ben-Zvi, 2008). Acknowledgements This article arose from research funded by an Australian Research Council (ARC) Discovery Grant (number DP20100158). Any opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views of the ARC. We wish to acknowledge the excellent support of our senior research assistants, Jo Macri, who organized all materials and setup for the classrooms, and typed the transcripts, and Dr Ben Kelly, who coded the Workbooks and Assessments in consultation with the first author.

20

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Appendix A

21

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Appendix B

22

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

23

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Appendix C. Rubrics for Workbook Questions Rubrics for Workbook Questions

Question

Code Description of response

Preliminary questions Definition & example of Mode (Q2) Median (Q3) Mean (Q4) Hat plo (Q5)

0 1 2 3

No response or a wrong measure described A weak reference such as “middle” or “average” A complete multi-part definition (perhaps with vague example) A complete definition related to an example in a realistic context

0 1 0 1 2 0 1 2 0 1 2

Inconsistent with class plot (Appendix A) Consistent with class plot (Appendix A) No response, idiosyncratic Notes at least one of the four tools or an identifiable strategy Explains how the tool or strategy helped make the decision No response, idiosyncratic or “yes” “No” and a vague reason such as “we’re all different” “No”, noting the small sample, non-representativeness, or bias No response, idiosyncratic Vague reference to “collecting more data” Recognising need for a random sample

Class data analysis Decision for class (Q6, Fig. 2) Explain (Q7, Fig. 2)

All Grade 6 students? (Q8, Fig. 2)

More certain? (Q9, Fig. 2)

Random sample description What does the random ABS plot tell you? (Q10, Q12, 0 Q14, Q16) 1 2 Tool/s and process used for analysis of random ABS 0 data (Q11, Q13, Q15, Q17) 1 2

No response, idiosyncratic Brief report on which was the faster group Extended accurate description of plot (implicit note of variation) No response, idiosyncratic One or two tools or strategies At least three tools and/or strategies (implicit note of variation)

24

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

Question

Code Description of response

Random sample analysis Any surprises from random samples? (Q18)

0

No response, idiosyncratic

1 2 0

Response did not fit plots Response fit plots No response, idiosyncratic

1 2 0 1 2 3 0 1 2

Vague reason for decision Convincing reason for decision No response, idiosyncratic A vague “no” and reason such as “people are different” A “no” related to sample size A tentative “yes” but acknowledging uncertainty No response, idiosyncratic Claim of “need for whole population” Recognition of need for more data or more samples with claim for population

0 1 2 0

No response, idiosyncratic “different” based on the small difference in means or a single extreme data value “same” based on an accurate description of the plots “more certain” or “certain” for responses that said “different” to Q22

1 2

Lack of certainty for all responses to Q22 “more certain” or “certain” for responses that said “same” to Q22

Decision based on random samples? (Q19)

Decision for all Grade 6 students? (Q20)

How to be more certain? (Q21)

Population analysis Decision for ABS “population” (Q22)

How to be more certain of decision? (Q23)

References Australian Curriculum, Assessment and Reporting Authority (ACARA) (2013). General capabilities in the australian curriculum, january, 2013 (updated September 2014) Sydney, NSW: ACARA. Australian Curriculum, Assessment and Reporting Authority (2015). Australian curriculum: Mathematics, Version 8.1, 15 December 2015Sydney, NSW: ACARA. Allmond, S., & Makar, K. (2014). From hat plots to box plots in TinkerPlots: supporting students to write conclusions which account for variability in data. In K. Makar, B. deSousa, & R. Gould (Eds.). Sustainability in statistics education (Proceedings of the 9th international conference on the teaching of statistics, Flagstaff, Arizona, July 13–18)Voorburg, The Netherlands: International Statistical Institute. Retrieved from http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_2E1_ALLMOND.pdf. Bakker, A., & Derry, J. (2011). Lessons from inferentialism for statistics education. Mathematical Thinking and Learning, 13, 5–26. Baroody, A. J., Feil, Y., & Johnson, A. R. (2007). An alternative reconceptualization of procedural and conceptual knowledge. Journal for Research in Mathematics Education, 38, 115–131. Ben-Zvi, D., Gil, E., & Apel, N. (2007). What is hidden beyond the data? Helping young students to reason and argue about some wider universe. In D. Pratt, & J. Ainley (Eds.). Reasoning about informal inferential statistical reasoning: A collection of current research studies. proceedings of the fifth international research forum on statistical reasoning, thinking, and literacy (SRTL-5). Ben-Zvi, D., Aridor, K., Makar, K., & Bakker, A. (2012). Students’ emergent articulations of uncertainty while making informal statistical inferences. ZDM Mathematics Education, 44, 913–925. Ben-Zvi, D. (2006). Scaffolding students’ informal inference and argumentation. In A. Rossman, & B. Chance (Eds.). Working cooperatively in statistics education (Proceedings of the 7th international conference on the teaching of statistics, Salvador, Bahai, Brazil, July 2–7)Voorburg, The Netherlands: International Association for Statistical Education and the International Statistical Institute. Retrieved from http://iase-web.org/documents/papers/icots7/2D1_BENZ.pdf. Biehler, R. (2007) Students’ strategies of comparing distributions in an exploratory data analysis context. Paper presented at the 56th session of the international statistical institute Lisbon. Available at: http://iase-web.org/documents/papers/isi56/IPM37_Biehler.pdf. Bill, A. F., Henderson, S., & Penman, J. (2010). Two test items to explore high school students’ beliefs of sample size when sampling from large populations. In L. Sparrow, B. Kissane, & C. Hurst (Eds.). Shaping the future of mathematics education (Proceedings of the 33rd annual conference of the mathematics education research group of Australasia, Fremantle, Australia, July 3–7) (pp. 77–84). Sydney: MERGA. Brodesky, A., Doherty, A., & Stoddard, J. (2008). Digging into data with TinkerPlots. Emeryville, CA: Key Curriculum Press. Cobb, P., Confrey, J., diSessa, A., Lehrer, R., & Schauble, L. (2003). Design experiments in educational research. Educational Researcher, 32(1), 9–13. Cobb, P., Jackson, K., & Munoz, C. (2016). Design research: A critical analysis. In L. D. English, & D. Kirshner (Eds.). Handbook of international research in mathematics education (pp. 481–503). (3rd ed.). New York: Routledge. Cobb, P. (1999). Individual and collective mathematical development: The case for statistical data analysis. Mathematical Thinking and Learning, 1, 5–43. Cobb, G. (2007). The introductory statistics course: A Ptolemaic curriculum? Technology Innovations in Statistics Education, 1(1), Retrieved from http://escholarship. org/uc/item/6hb3k0nz. Common Core State Standards Initiative (2010). Common core state standards for mathematics. Washington, DC: National Governors Association for Best Practices and the Council of Chief State School Officers. Retrieved from http://www.corestandards.org/assets/CCSSI_Math%20Standards.pdf. delMas, R., Garfield, J., & Zieffler, A. (2014). Using TinkerPlots to develop tertiary students’ statistical thinking in a modeling-based introductory statistics class. In D. Wassong, P. R. Fischer, R. Hochmuth, & P. R. Bender (Eds.). Using tools for learning mathematics and statistics (pp. 405–420). Heidelberg: Springer Spektrum. English, L., & Watson, J. (2015a). Exploring variation in measurement as a foundation for statistical thinking in the elementary school. International Journal of STEM Education, 2(3), http://dx.doi.org/10.1186/s40594-015-0016-x. English, L., & Watson, J. (2015b). Statistical literacy in the elementary school: Opportunities for problem posing. In F. M. Singer, N. Ellerton, & J. Cai (Eds.). Problem

25

Journal of Mathematical Behavior xxx (xxxx) xxx–xxx

J. Watson, L. English

posing: From research to effective practice (pp. 241–256). New York: Springer. English, L., & Watson, J. (2016). Development of probabilistic understanding in fourth grade. Journal for Research in Mathematics Education, 47, 28–62. English, L. D. (2014). Statistics at play. Teaching Children Mathematics, 21(1), 37–44. Franklin, C., Kader, G., Mewborn, D., Moreno, J., Peck, R., Perry, M., et al. (2007). Guidelines for assessment and instruction in statistics education (GAISE) report: A preK12 curriculum frameworkAlexandria, VA: American Statistical Association. Retrieved from http://www.amstat.org/education/gaise/. Frischemeier, D. (2014). Comparing groups by using TinkerPlots as part of a data analysis task −Tertiary students’ strategies and difficulties. In K. Makar, B. deSousa, & R. Gould (Eds.). Sustainability in statistics education (Proceedings of the 9th international conference on the teaching of statistics, Flagstaff, Arizona, July 13–18) Voorburg, The Netherlands: International Statistical Institute. Retrieved from http://iase-web.org/icots/9/proceedings/pdfs/ICOTS9_8J3_FRISCHEMEIER.pdf. Gal, I., Rothschild, K., Wagner, D.A., (1989). Which group is better? The development of statistical reasoning in elementary school children. Paper presented at the meeting of the Society for Research in Child Development Kansas City, MO. Gal, I., Rothschild, K., Wagner, D.A., (1990). Statistical concepts and statistical reasoning in school children: Convergence or divergence? Paper presented at the meeting of the American Educational Research Association, Boston. Gal, I. (2002). Adults’ statistical literacy: Meanings, components, responsibilities. International Statistical Review, 70, 1–51. Garfield, J., & Ben-Zvi, D. (2008). Developing students’ statistical reasoning: Connecting research and teaching practice. New York: Springer. Groth, R. E., & Bergner, J. A. (2006). Preservice elementary teachers’ conceptual and procedural knowledge of mean, median, and mode. Mathematical Thinking and Learning, 8(1), 37–63. Groth, R. E. (2014). Prospective teachers’ procedural and conceptual knowledge of mean absolute deviation. Investigations in Mathematics Learning, 6(3), 51–69. Hiebert, J., & Carpenter, T. P. (1992). Learning and teaching with understanding. In D. A. Grouws (Ed.). Handbook of research on mathematics teaching and learning (pp. 65–97). New York: National Council of Teachers of Mathematics & MacMillan. Hiebert, J., & Grouws, D. A. (2007). The effects of classroom mathematics teaching on students’ learning. In F. K. LesterJr. (Ed.). Second handbook of research on mathematics teaching and learning (pp. 371–404). Charlotte, NC: National Council of Teachers of Mathematics and Information Age Publishing. Kader, G. D., & Jacobbe, T. (2013). Developing essential understanding of statistics for teaching mathematics in grades 6–8. Reston, VA: National Council of Teachers of Mathematics. Konold, C., & Miller, C. D. (2005). Tinkerplots: Dynamic data exploration. [Computer software]. Emeryville, CA: Key Curriculum Press. Konold, C., & Miller, C. D. (2011). TinkerPlots: Dynamic data exploration [Computer software, version 2.2]. Emeryville, CA: Key Curriculum Press. Konold, C., Pollatsek, A., Well, A., & Gagnon, A. (1997). Students analyzing data: Research of critical barriers. In J. B. Garfield, & G. Burrill (Eds.). Research on the role of technology in teaching and learning statistics (Proceedings of the 1996 IASE Round Table Conference, University of Granada, Spain, 23–27 July)Voorburg, The Netherlands: International Statistical Institute. [Retrieved from] http://iase-web.org/documents/papers/rt1996/13.Konold.pdf. Lavigne, N. C., & Lajoie, S. P. (2007). Statistical reasoning of middle school children engaging in survey inquiry. Contemporary Educational Psychology, 32(4), 630–666. Makar, K., & Rubin, A. (2009). A framework for thinking about informal statistical inference. Statistics Education Research Journal, 8(1), 82–105. Retrieved from http:// iase-web.org/documents/SERJ/SERJ8(1)_Makar_Rubin.pdf. Makar, K., Bakker, A., & Ben-Zvi, D. (2011). The reasoning behind informal statistical inference. Mathematical Thinking and Learning, 13, 152–173. Makar, K. (2014). Young children’s explorations of average through informal inferential reasoning. Educational Studies in Mathematics, 86(1), 61–78. Ministry of Education (2009). The New Zealand curriculum: Mathematics standards for years 1–8. Wellington, NZ: Author. Retrieved from http://nzcurriculum.tki.org.nz/ The-New-Zealand-Curriculum/Learning-areas/Mathematics-and-statistics. Moore, D. S., & McCabe, G. P. (1989). Introduction to the practice of statistics. New York: W.H Freeman. Moore, D. S. (1990). Uncertainty. In L. A. Steen (Ed.). On the shoulders of giants: New approaches to numeracy (pp. 95–137). Washington, DC: National Academy Press. National Council of Teachers of Mathematics (1989). Curriculum and evaluation standards for school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics (2000). Principles and standards for school mathematics. Reston, VA: Author. National Council of Teachers of Mathematics (2006). Curriculum focal points for prekindergarten through grade 8 mathematics: A quest for coherence. Reston, VA: Author. Rao, C. R. (1975). Teaching of statistics at the secondary level: An interdisciplinary approach. International Journal of Mathematical Education in Science and Technology, 6, 151–162. Reaburn, R. (2014). Introductory statistics course tertiary students’ understanding of p-values. Statistics Education Research Journal, 13(1), 53–65. Available at http:// iase-web.org/documents/SERJ/SERJ13(1)_Reaburn.pdf. Rowe, P. J., & Evans, P. (2007). Ball color, eye color, and a reactive motor skill. PubMed. Retrived from http:www.ncbi.nlm.nih.gov/pubmed/7808908. Rubin, A., Corwin, R. B., & Friel, S. (1997). Data: Kids, cats, and ads. Investigations in number, data, and space. Menlo Park, CA: Dale Seymour Publications. Shaughnessy, J. M., Chance, B., & Kranendonk, H. (2009). Focus on high school mathematics: Reasoning and sense making in statistics and probability. Reston, VA: National Council of Teachers of Mathematics. Watson, J. M. (1998). Assessment of statistical understanding in a media context. In L. S. Pereira-Mendoza, T. W. Kee, & W. Wong (Eds.). Statistical education –expanding the network (Proceedings of the 5th international conference on the teaching of statistics). VoorburgThe Netherlands: International Statistical Institute. [Retrieved from] http://iase-web.org/documents/papers/icots5/Topic6w.pdf. Watson, J. M. (2001). Longitudinal development of inferential reasoning by school students. Educational Studies in Mathematics, 47, 337–372. Watson, J. M. (2006). Statistical literacy at school: Growth and goals. Mahwah, NJ: Lawrence Erlbaum. Watson, J. (2008). Eye colour and reaction time: An opportunity for critical statistical reasoning. Australian Mathematics Teacher, 64(3), 30–40. Watson, J., Beswick, K., Brown, N., Callingham, R., Muir, T., & Wright, S. (2011). Digging into Australian data with TinkerPlots. Melbourne: Objective Learning Materials. Watson, J. M., & Callingham, R. A. (2003). Statistical literacy: A complex hierarchical construct. Statistics Education Research Journal, 2(2), 3–46. [Retrieved from] http://iase-web.org/documents/SERJ/SERJ2(2)_Watson_Callingham.pdf. Watson, J. M., & Chick, H. L. (2004). What is unusual? The case of a media graph. In M. Johnsen-HØines, & A. B. Fuglestad (Vol. Eds.), Proceedings of the 28th annual conference of the International Study Group for the Psychology of Mathematics Education: Vol. 2, (pp. 207–214). Bergen, Norway: PME. Watson, J., & English, L. (2015). Introducing the practice of statistics: Are we environmentally friendly? Mathematics Education Research Journal, 27, 585–613. http:// dx.doi.org/10.1007/s13394-015-0153-z. Watson, J., & English, L. (2017). Reaction time in Grade 5: Data collection within the practice of statistics. Statistics Education Research Journal, 16(1), 262–293. [Retrieved from] https://iase-web.org/documents/SERJ/SERJ16(1)_Watson.pdf. Watson, J. M., Fitzallen, N. E., Wilson, K. G., & Creed, J. F. (2008). The representational value of hats. Mathematics Teaching in the Middle School, 14(1), 4–10. Watson, J. M., & Moritz, J. B. (1999). The beginning of statistical inference: Comparing two data sets. Educational Studies in Mathematics, 37, 145–168. Wild, C., & Pfannkuch, M. (1999). Statistical thinking in empirical enquiry. International Statistical Review, 67(3), 223–248. Zieffler, A., & Fry, E. (Eds.). (2015). Reasoning about uncertainty: Learning and teaching informal inferential reasoning. Minneapolis, MN: Catalyst Press.

26