Internet and Higher Education 10 (2007) 89 – 101
A comparison of student evaluations of teaching between online and face-to-face courses Henry F. Kelly a,⁎, Michael K. Ponton b,1 , Alfred P. Rovai b,2 b
a Ohio Christian University, 1476 Lancaster Pike, Circleville, OH USA 43113 Regent University, 1000 Regent University Drive, Virginia Beach, VA USA 23464
Abstract The literature contains indications of a bias in student evaluations of teaching (SET) against online instruction compared to faceto-face instruction. The present case study consists of content analysis of anonymous student responses to open-ended SET questions submitted by 534 students enrolled in 82 class sections taught by 41 instructors, one online and one face-to-face class section for each instructor. There was no significant difference in the proportion of appraisal text segments by delivery method, suggesting no delivery method bias existed. However, there were significant differences in the proportion of text segments for topical themes and topical categories by delivery method. Implications of the findings for research and practice are presented. © 2007 Elsevier Inc. All rights reserved. Keywords: Assessment; Distance education; Higher education; Instructional effectiveness; Student evaluation of instruction
1. Introduction Student evaluations of teaching (SET) need to be reliable, valid, and accurate because they are frequently used for high-stakes summative evaluation decisions about instructors, such as promotion, tenure, and merit pay. Therefore, these evaluations should adequately assess the effectiveness of instruction and not be biased by factors outside the instructor's control. In general SET are believed to be valid, reliable, and worthwhile means of evaluating instructional effectiveness (Braskamp & Ory, 1994; Cashin, 1995; Centra, 1993; D'Apollonia & Abrami, 1997; Feldman, 1997; Marsh & Dunkin, 1997; Marsh & Roche, 2000; McKeachie, 1997; Theall & Franklin, 2001). However, the literature identifies several moderating variables that may bias student evaluations. Some of these variables include academic discipline, class size, content area, grading leniency, level of course, student motivation, teacher personality, type of course requirements, and method of course delivery (face-to-face, distance education). A definition of bias is a characteristic of instructor, course, or student that affects SET, either positively or negatively, but is unrelated to the criteria of good teaching (Centra & Gaubatz, 2000). Therefore, student evaluations
⁎ Corresponding author. Tel.: +1 740 420 5924; fax: +1 740 477 7845. E-mail addresses:
[email protected] (H.F. Kelly),
[email protected] (A.P. Rovai). 1 Tel.: +1 757 226 4806; fax: +1 757 226 4857. 2 Tel: +1 757 226 4861; fax: +1 757 226 4318. 1096-7516/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.iheduc.2007.02.001
90
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
are biased if some characteristic of high-quality teaching tends to cause low student ratings or some characteristic of low-quality teaching tends to cause high student ratings (Martin, 1998). Ideally all influences other than instructional effectiveness are randomly distributed throughout the student pool and, therefore, do not introduce bias in the SET (Haladyna & Hess, 1994). Unwanted influences are often systematic and introduce bias into the ratings. However, the existence of a significant correlation between SET and some other variable does not necessarily denote the existence of a bias or a threat to validity (Martin, 1998). Systematic differences may be caused by bias or by valid instructional effectiveness. If a variable is related both to student ratings and to other indicators of effective instruction, the validity of the ratings is supported. Conversely, if a variable is related to student ratings without similarly affecting instructional effectiveness, a bias is supported (Marsh, 1987). If there is a SET bias against online courses compared to face-to-face courses, then evaluations may not equitability be compared. This is of concern because the use of the Internet to deliver courses has risen tremendously over the last several years. The most recent Sloan Consortium survey (Allen & Seaman, 2006) reported 3.18 million students took online courses in fall 2005, a 35% increase over the previous year. The majority of institutions surveyed agreed that online education is critical to their long term strategy, and there is no evidence that online enrollment has reached a plateau. Numerous studies of student learning in online and other methods of distance education compared to face-toface instruction have resulted in findings of no significant difference (Moore & Thompson, 1997; Russell, 2001), but perceived effectiveness or satisfaction may differ from academic performance results. Because SET are often used for high-stakes personnel decisions, it is vital that they accurately assess teaching effectiveness. However, even if quantitative ratings of online instruction may significantly differ from that of face-toface instruction, these ratings cannot reveal why. Examining qualitative SET comments may help explain the differences between how online students evaluate instructional effectiveness as compared to face-to-face students. Few studies of the potential bias in student evaluations of the online delivery of education have been accomplished. The majority of these studies compared SET of only a few online and face-to-face courses; thus, there is a need to review data from many courses across an institution. Knowledge of a SET bias will inform administrators' use of SET for evaluations. The problem addressed in this study is the possibility that the method of course delivery affects SET independent of instructional effectiveness. Therefore, the purpose of this study is to examine students' responses to open-ended questions evaluating instructional effectiveness for both online and face-to-face courses in order to determine if a potential SET bias exists. The research question of this study is what are the differences in SET between online and face-to-face courses as evidenced by a thematic analysis of open-ended questions? 2. Literature review SET are considered by many researchers to be the single most valid source of data on teaching effectiveness (e.g., Braskamp & Ory, 1994; Centra, 1993; Marsh & Dunkin, 1997). One view of SET is that they are valid if they accurately reflect students' assessment of instruction quality regardless of the amount of learning that occurred; a second view is that SET are valid if they accurately reflect instructional effectiveness (Abrami, d'Apollonia, & Cohen, 1990). In the second view, a significant correlation would exist between SET and instructional effectiveness scores. Unfortunately, SET are difficult to validate because no single criterion of instructional effectiveness is sufficient or acceptable (e.g., Kulik, 2001; Marsh, 1987). Researchers have correlated SET with various indicators of instructional effectiveness: (a) student learning, (b) changes in student behavior, (c) instructor self-evaluations, (d) evaluations of peers or administrators who attend class sessions, (e) frequency of occurrence of specific behaviors observed by trained observers, and (f) alumni ratings (Kulik, 2001; Marsh & Dunkin, 1997). The most widely accepted criterion of instructional effectiveness is student learning. Cohen (1981) and Feldman (1989) conducted meta-analyses of multi-section courses and found moderate correlations between SET and student learning as measured by examination scores. These studies indicate that students tend to give higher SET ratings to instructors from whom they learn much and lower SET ratings to those instructors from whom they learn little, thus supporting the validity of SET instruments. Reliability of well-designed SET instruments varies with the number of raters; the more the raters, the higher the reliability (Hoyt & Lee, 2002; Marsh & Dunkin, 1997). Given a sufficient number of raters, the reliability of SET compares favorably with that of the best objective tests (Marsh & Dunkin, 1997). SET have been shown to be stable over time (Overall & Marsh, 1980) with ratings of instructors at the end of a course and retrospectively one or more
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
91
years after finishing the course (r = .83), which indicates that the perspectives of the same students do not change over time. Many researchers believe SET to be multidimensional. Teaching is a complex activity consisting of multiple dimensions. Therefore, well-designed SET instruments are usually multidimensional, consisting of several items linked to specific dimensions that students can judge to be important to teaching (Marsh, 1987; Marsh & Roche, 1997). Two common methods of identifying factors in SET are (a) an empirical approach primarily using statistical techniques such as factor analysis and (b) a logical analysis of effective teaching and the purposes of the rating supplemented by reviews of research and feedback from students and instructors (Marsh & Dunkin, 1997). The factors determined from empirical approaches closely agree with those determined through logical analyses, adding validity to the argument for the multidimensionality of SET (Marsh & Dunkin, 1997). Although many studies show no significant difference in learning between distance learning and face-to-face instruction (Russell, 2001), the results are mixed regarding SET. Some studies found higher SET ratings for online over face-to-face education, some lower, and some with no significant difference. Perhaps those studies indicating lower satisfaction with online courses may be related to the course design and pedagogy used by the instructor (Rovai, 2004), especially for those who are new to this delivery medium. Some studies found students favor face-to-face education but found no significant differences in academic achievement. In studies comparing online and face-to-face class offerings, online students perceived their course less positively than face-to-face students (Cheng, Lehman, & Armstrong. 1991; Davies & Mendenhall, 1998; Johnson, Aragon, Shaik, & Palma-Rivas, 1999). In these same studies, there was no significant difference in learning outcomes as measured by test scores or blind reviews of major course projects. Several studies found face-to-face students gave higher SET than online students but there was no objective measurement of learning (Hoban, Neu, & Castel, 2002; Mylona, 1999; Rivera & Rice, 2002). Some researchers found no significant differences in SET between the two delivery methods but found significantly higher learning in the online class as measured by final exam scores (Campbell, Floyd, & Sheridan, 2002; Navarro & Shoemaker, 2000). Some studies found no significant difference in both SET and learning as measured by test scores (Neuhauser, 2002; Skylar, 2004). Other studies found no significant difference between online and face-to-face SET without a performance measurement (Graham, 2001; Stocks & Freddolino, 1998; Van Schaik & Barker; 2003; Waschull, 2001). Some studies showed that students favor online over face-to-face instruction but there was no objective measurement of learning (Bryant, 2003; Lim-Fernandes, 2000; Poirier & Feldman, 2004; Wagner, Werner, & Schramm, 2002). Other studies showed that students were generally satisfied with their online class, but there was not a face-to-face control group in the study (Lawson, 2000; Phillips & Santoro, 1989; Restauri, King, & Nelson, 2001; Thurmond, Wambach, Connors, & Frey, 2002). Comparisons were, therefore, made with other face-to-face classes that participants had taken in the past. Rovai, Ponton, Derrick, and Davis (2006) conducted the only qualitative analysis to date in an attempt to identify bias in an online course SET. The researchers found that responses to open-ended questions from online students significantly differed from comments from face-to-face students. Online courses received a greater proportion of both praise and negative criticism, whereas face-to-face courses received a greater proportion of constructive criticism. Online courses received a greater proportion of comments in the clarity and communication skills theme and teacher– student interaction and rapport theme. Face-to-face courses received a greater proportion of comments in the grading and examinations theme and student perceived learning theme. SET may be relatively valid and reliable, but the literature shows there are some factors that may impact SET results outside an instructor's control. Some of these factors are statistically significant but are of little practical significance. However, others (e.g., academic discipline and prior interest) require control or at least need to be considered by administrators in their summative evaluations. Trying to isolate potential biases is difficult because of the many complex variables and interactions involved. Additional methodologically sound research is needed on some of these factors. Furthermore, this literature review reveals a clear lack of studies about the qualitative comments contained in SET instruments regarding online education compared with face-to-face education. 3. Method Qualitative research methods were used to content-analyze student comments obtained from a cross-sectional survey of online and face-to-face students enrolled at a comprehensive, nondenominational Christian university located
92
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
in the southern U.S. during academic year 2004–05. These comments were submitted by students in response to openended questions in the standard end-of-course survey. Most of the institution's 3443 students were enrolled in graduate programs with only 7% enrolled in undergraduate programs. The majority of students were women (56%), enrolled part time (58%), and aged 25 and above (88%). The students came from various educational and religious backgrounds, although most are Christian. Students are primarily commuters. SET data were collected using the University's confidential online end-of-course survey form. Students were informed that their evaluations were anonymous. They were also told that a compilation of all evaluations for each course would be reviewed by their dean and then made available to their instructor after grades for the course had been submitted. Toward the end of the semester, students were sent several emails advising them that they had surveys ready for completion. Multiple reminders were sent until shortly after the semester ended. These SET were anonymous and contained no demographic questions; therefore, no characteristics about participants were available. The evaluation form contained the following questions that were used in this study: Open-ended: 1. 2. 3. 4. 5.
If we redesign this course, what one element would you want to keep in the course? What element would you want to remove from the course? What was your most significant learning experience in this course? What was your greatest disappointment about the course? Are there any additional comments you wish to make about the course (positive or negative)? Closed-ended, Likert-scale (anchored with 1 = strongly disagree and 6 = strongly agree):
1. Overall, the instructor is effective. 2. Overall, this is an effective course. Online courses were delivered using the Internet-based Blackboard course management system. Courses contained a combination of reading assignments supplemented by online lecture notes, collaborative online discussions, and assessment tasks requiring individual or group work. Online discussions generally were asynchronous interaction among students moderated by the instructor. Face-to-face courses met on the main campus or at a satellite campus. Course format varied from school to school and program to program, ranging from three 50-minute classes per week to weekend intensives. Some of the courses were supplemented with online Blackboard sites for reading announcements, downloading information, and participating in discussions. A stratified random sample was taken to provide representative sections for analysis. For each instructor who taught at least one online course and the same course face-to-face during academic year 2004–05 and for which there was sufficient survey data, one course was selected at random. Then one online and one face-to-face section was selected at random. Therefore, 82 sections were represented in the sample: 41 online and 41 face-to-face. The present study was an extension of the Braskamp, Ory, and Pieper (1981) and Rovai et al. (2006) studies, which resulted in identifying two sets of categories: appraisal and topical. Appraisal categories addressed favorableness, value, worth, or usefulness. The appraisal categories found in the two previous studies were similar. The present study initially used the appraisal categories found by Rovai et al. Topical categories addressed dimensions of instructional effectiveness. The present study initially used the 22 categories found by Braskamp et al. (1981) instead of the 6 found by Rovai et al. (2006). Using additional topical categories was expected to provide a sharper distinction between information segments so that differences would more clearly emerge. Standard qualitative research procedures were used (e.g., Creswell, 2002; Patton, 2002). The responses were first divided into text segments, which were paragraphs, sentences, or portions of sentences that related to a single distinct concept (Creswell, 2002). Next, each text segment was classified into both an appraisal and topical category and labeled with a code. This process was accomplished manually. The categories were refined as needed to clarify the meaning of each, ensure data belonging in a certain category held together in a meaningful way (internal homogeneity), and create sharp distinctions among them (external heterogeneity). This process was continued until saturation was reached. Throughout the process, the researcher looked for additional categories that could emerge. Interpretation primarily consisted of identifying patterns in the data and comparing findings with those reported in the literature. Cross-classifying the data and quantifying qualitative data helped generate insight.
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
93
Quantifying the qualitative data primarily consisted of analyzing the frequency of student responses by category and delivery method through statistical analyses. Two-way contingency analyses of the frequency of student responses by appraisal category and delivery method were performed using the chi-square cross tabulation statistic to determine if the frequency counts were significantly different by course delivery method. The frequency of topical categories was similarly evaluated. These computations were similar to those used by Rovai et al. (2006). Cells in which the adjusted standardized residuals were ≥ 2.0 (α = .05) were considered significant (McClean, 2004). Most SET bias studies use quantitative analyses to determine if there are significant differences in dependent variables. Therefore, for comparative purposes, the present study also included standard statistical analyses of closedended SET questions. A one-way multivariate analysis of variance (MANOVA) with one independent variable at two levels (online and face-to-face instruction) and two dependent variables (overall instructor effectiveness and overall course effectiveness) was used to determine if there were significant differences in the centroid of the dependent variables for various levels of the independent variable. 4. Results A total of 866 students took the 82 classes included in this study of which 43.1% (373/866) were in face-to-face classes and 56.9% (493/866) were in online classes. The overall response rate was 61.7% (534/866) with separate response rates of 59.8% (223/373) for face-to-face students and 63.1% (311/493) for online students. Content analysis resulted in identifying 1742 distinct text segments. Each text segment was coded by both an appraisal category and topical category. There were 95 text segments not coded that were typically in response to the question “Are there any additional comments you wish to make about the course?” with responses of “NA,” “None,” or “No.” The data were coded in three iterations with category names and definitions revised to more accurately describe the text segments. The following are the resulting appraisal categories: 1. Praise: approval or rewarding comments. 2. Constructive criticism: specific suggestions for improvements; includes comments about leaving course as is or do not change anything. 3. Negative criticism: negative value judgments concerning the course, personal attacks and gratuitous insults directed against the instructor, or derogatory comments with no accompanying suggestion for improvement. The following are the resulting topical categories: Instructor: 1. Attitude: interest in, enthusiasm or passion for, and attitude toward teaching, course, and subject matter 2. Rapport: concern and care for individual students (i.e., encouraging, warm, and understanding); knows student names; bias toward a few students 3. Person: professional, Christian, nervous, impatient, amiable, personality, has annoying mannerism, sincere, expresses humor, self-confident 4. Knowledgeable: well versed in subject matter or material, experienced 5. Stimulation: challenging, interesting, inspiring, creates interest in field, stimulates thought, encourages new ideas 6. Ability: makes clear and concise presentations, talks too fast or too slow, easy to take notes, explains well, uses good examples 7. Preparedness: good classroom management, follows an outline, poorly prepared for class, good organization of lecture 8. Helpfulness: patient, helpful, readily available, spends time with individual students outside class, not helpful, not available, absent 9. Teacher overall: excellent teacher, good/poor role model, desire to take more courses Course: 10. Organization: clarity of expectations or objectives, even/uneven pace, good/poor course structure, integrated lectures and labs, course missed something general (i.e., not content or assignments)
94
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
11. Content: learned subject matter; learned writing, speaking, calculating, selling skills; includes suggestions about specific content but not assignments 12. Materials: textbooks; handout; audiovisual materials were good, poor, used effectively; lab equipment; software 13. Workload: too many or too few required assignments 14. Lecture: lectures were good, poor, not worth attending; integrative; included use of video instruction and guest speakers 15. Discussion: class interactions were good, relevant; discussions were interesting, superficial, little accomplished in discussions; includes suggestions to keep, eliminate, or change discussions 16. Assignments: suggested assignments, good assignments to keep, bad assignments to eliminate; includes suggestions to eliminate quizzes or exams 17. Overall course: enjoyment of course, good course, course is beneficial, course opens up new ideas, stimulates thinking, did not learn anything or learned a lot without reference to specific assignments or content, no disappointment with course Grading: 18. Grading fairness: too easy/too strict, ambiguous test questions, unclear grading procedures, changed grading procedures during semester 19. Grading timeliness: feedback or grading took longer than expected 20. Feedback quantity/quality: great feedback, not enough feedback
Table 1 Frequency of text segments by category and delivery method Topical category
Instructor 1. Attitude 2. Rapport 3. Person 4. Knowledgeable 5. Stimulation 6. Ability 7. Preparedness 8. Helpfulness 9. Teacher overall Instructor total Course 10. Organization 11. Content 12. Materials 13. Workload 14. Lecture 15. Discussion 16. Assignments 17. Overall course Course total Grading 18. Grading fairness 19. Grading timeliness 20. Feedback quality/quantity Grading total Grand total
Praise
Constructive
Negative
Total
F
O
F
O
F
O
F
O
13 10 9 21 8 6 2 9 31 109
9 9 3 15 5 7 3 15 27 93
0 2 0 0 0 0 1 0 4 7
0 4 0 0 0 0 0 0 6 10
0 1 1 1 1 0 6 4 2 16
0 10 0 0 0 1 3 3 2 19
13 13 10 22 9 6 9 13 37 132
9 23 3 15 5 8 6 18 35 122
5 97 21 0 5 16 35 92 271
21 127 60 2 3 24 47 126 410
45 25 18 4 8 17 49 29 194
53 22 38 3 16 39 78 38 287
30 17 11 17 4 15 10 8 112
30 12 16 27 0 26 8 15 134
80 139 50 21 17 48 94 129 578
104 161 114 32 19 89 133 179 831
1 2 5 8 388
0 1 6 7 510
1 4 2 7 209
0 3 2 5 302
7 9 6 22 150
13 7 10 30 183
9 15 13 37 747
13 11 18 42 995
Note. 1742 text segments submitted by 534 students were analyzed. F = Face-to-face and O = Online.
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
95
4.1. Responses to open-ended SET questions Table 1 presents the frequency of text segments by category and class delivery method. Online students contributed 57.1% (995/1742) of all the text segments, and face-to-face students had 42.9% (747/1742) of the total. Praise text segments were 51.5% (898/1742) of all text segments of which 56.8% (510/898) were from online students. Constructive criticism text segments were 29.3% (511/1742) of all text segments of which 59.1% (302/511) were from online students. Negative criticism text segments were 19.1% (333/1742) of all text segments of which 55.0% (183/ 333) were from online students. Of note were the differences in proportions of text segments about materials in face-toface classes (6.7%) compared to online (11.5%); the instructor theme (which aggregated categories 1–9), 17.7% in face-to-face classes and 12.3% in online; and the course theme (which aggregated categories 10–17), 77.4% in face-toface classes and 83.5% online. Further, 14.6% (109/747) of text segments in face-to-face classes praised the instructor compared to only 9.3% (93/995) online. On the other hand, only 36.3% (271/747) of text segments in face-to-face classes praised the course compared to 41.2% (410/995) online. A two-way contingency table analysis was conducted to evaluate whether the appraisal categories raised by students differed by online and face-to-face classes. The two variables were appraisal categories with three levels (praise, constructive criticism, and negative criticism) and method of class delivery with two levels (online and face-to-face). The proportion of appraisal categories did not differ significantly by delivery method, Pearson χ2(2, N = 1742) = 1.49, p = .47 (see Table 2). A two-way contingency table analysis was conducted to evaluate whether the topical categories raised by students differed by online and face-to-face classes. The two variables were topical categories with 20 levels and class delivery method with two levels (online and face-to-face). The proportion of topical categories differed significantly by delivery method, Pearson χ2(19, N = 1742) = 38.31, p b .01, Cramér's V = .15, indicating a small effect size (see Table 3). The largest adjusted standardized residual was 3.4 for online materials. Other significant adjusted standardized residuals were 2.5 for face-to-face person, and 2.1 for face-to-face knowledgeable. A two-way contingency table analysis was also conducted to evaluate whether the broader topical themes (i.e., instructor, course, and grading) raised by students differed by online and face-to-face classes. The proportion of topical theme text segments differed significantly by delivery method, Pearson χ2(2, N = 1742) = 11.06, p b .01, Cramér's V = .08, indicating a small effect size (see Table 4). The adjusted standardized residual for face-to-face instructor and online course were significant at 3.2. 4.2. Responses to closed-ended SET questions The overall descriptive statistics are contained in Table 5. MANOVA results indicate there was no significant difference between class delivery method, Pillai's Trace = .03, F(2,79) = 1.15, p = .32. The dependent variables were highly intercorrelated, Pearson product-moment correlation, r = .80. 5. Discussion The nonsignificant finding from the two-way contingency table analysis for appraisal categories by delivery method was consistent with the results from MANOVA of responses to closed-ended questions contained in the SET Table 2 Crosstabulation of appraisal categories by delivery method Face-to-face Praise Count (expected count) Residual (adjusted standardized residual) Constructive criticism Count (expected count) Residual (adjusted standardized residual) Negative criticism Count (expected count) Residual (adjusted standardized residual)
Online
388 (385.1) 2.9 (0.3)
510 (512.9) − 2.9 (− 0.3)
209 (219.1) − 10.1 (− 1.1)
302 (291.9) 10.1 (1.1)
150 (142.8) 7.2 (0.9)
183 (190.2) − 6.9 (− 0.9)
Note. 1742 text segments submitted by 534 students were analyzed. Pearson χ2(2, N = 1742) = 1.49, p = .47.
96
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
Table 3 Crosstabulation of topical categories by delivery method Face-to-face Attitude Count (expected count) Residual (adjusted standardized residual) Rapport Count (expected count) Residual (adjusted standardized residual) Person Count (expected count) Residual (adjusted standardized residual) Knowledgeable Count (expected count) Residual (adjusted standardized residual) Stimulation Count (expected count) Residual (adjusted standardized residual) Ability Count (expected count) Residual (adjusted standardized residual) Preparedness Count (expected count) Residual (adjusted standardized residual) Helpfulness Count (expected count) Residual (adjusted standardized residual) Teacher overall Count (expected count) Residual (adjusted standardized residual) Organization Count (expected count) Residual (adjusted standardized residual) Content Count (expected count) Residual (adjusted standardized residual) Materials Count (expected count) Residual (adjusted standardized residual) Workload Count (expected count) Residual (adjusted standardized residual) Lecture Count (expected count) Residual (adjusted standardized residual) Discussion Count (expected count) Residual (adjusted standardized residual) Assignments Count (expected count) Residual (adjusted standardized residual) Overall course Count (expected count) Residual (adjusted standardized residual) Grading fairness Count (expected count) Residual (adjusted standardized residual) Grading timeliness Count (expected count) Residual (adjusted standardized residual)
Online
13 (9.4) 3.6 (1.5)
9 (12.6) − 3.6 (− 1.5)
13 (15.4) −2.4 (− 0.8)
23 (20.6) 2.4 (0.8)
10 (5.6) 4.4 (2.5)
3 (7.4) − 4.4 (− 2.5)
22 (15.9) 6.1 (2.1)
15 (21.1) − 6.1 (− 2.1)
9 (6.0) 3.0 (1.6)
5 (8.0) − 3.0 (− 1.6)
6 (6.0) 0.0 (0.0)
8 (8.0) 0.0 (0.0)
9 (6.4) 2.6 (1.3)
6 (8.6) − 2.6 (− 1.3)
13 (13.3) −0.3 (− 0.1)
18 (17.7) 0.3 (0.1)
37 (30.9) 6.1 (1.5)
35 (40.5) − 6.1 (− 1.5)
80 (78.1) 1.1 (0.2)
104 (105.1) − 1.1 (− 0.2)
139 (128.6) 10.4 (1.3)
161 (171.4) − 10.4 (− 1.3)
50 (70.3) −20.3 (− 3.4)
114 (93.7) 20.3 (3.4)
21 (22.7) −1.7 (− 0.5)
32 (30.3) 1.7 (0.5)
17 (15.4) 1.6 (0.5)
19 (20.6) − 1.6 (− 0.5)
48 (58.7) −10.7 (− 1.9)
89 (78.3) 10.7 (1.9)
94 (97.3) −3.3 (− 0.5)
133 (129.7) 3.3 (0.5)
129 (132.1) −3.1 (− 0.4)
179 (175.9) 3.1 (0.4)
9 (9.4) −0.4 (− 0.2)
13 (12.6) 0.4 (0.2)
15 (11.1) 3.9 (1.5)
11 (14.9) − 3.9 (− 1.5)
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
97
Table 3 (continued) Face-to-face Feedback quantity/quality Count (expected count) Residual (adjusted standardized residual)
13 (13.3) − 0.3 (− 0.1)
Online 18 (17.7) 0.3 (0.1)
Note. 1742 text segments submitted by 534 students were analyzed. Pearson χ2(19, N = 1742) = 38.31, p b .01, Cramér's V = .15.
instrument regarding overall evaluations of the instructor and course. These results suggest congruent validity: they both indicate there is no significant difference in students' appraisal of instructional effectiveness between online and face-to-face courses and, therefore, there is no indication of SET bias. The finding that 52% of all text segments were praise and only 19% were negative criticism indicates generally satisfied students, and it appears that both online and face-to-face students were equally satisfied with instruction. The finding in the present study about appraisal text segments did not agree with that of Rovai et al. (2006) because they found a significant difference between online and face-to-face students. About one-half of all text segments in the present study were praise compared with only about one-third in the Rovai et al. study. Additionally, in the present study 20% of text segments from face-to-face students and 18% from online students were negative criticism compared with 14% and 22%, respectively, in the Rovai et al. study. The largest difference was with constructive criticism, which accounted for 28% of text segments from face-to-face students and 30% from online students in the present study compared with 56% and 45%, respectively, in the Rovai et al. study. These differences could be because of the types of courses included in their study, which were research design, statistics, or school counseling for doctoral students. There may, in fact, be a SET bias against these types of online courses. It is also possible the instructors involved in the previous study were not as effective in teaching these types of courses online. Additionally, it is possible that the interpretation of categories was not consistent between the two studies. The two-way contingency table analysis of topical categories and topical themes indicates the proportion of these text segments differed significantly by delivery method. Based upon an analysis of the adjusted standardized residuals, face-to-face students mentioned the topical theme of instructor (3.2) and topical categories of person (2.5) and knowledgeable (2.1) significantly more than online students. Online students mentioned the topical theme of course (3.2) and topical category of materials (3.4) significantly more than face-to-face students. The results of the content analysis in the present study resulted in a slightly different list of topical categories than those found by Braskamp et al. (1981). Additionally, the Braskamp et al. study had a much greater proportion of student comments in the instructor theme (50%) than the present study (15%), a smaller proportion of comments in the course theme (33%) than the present study (81%), and greater proportion of comments in the grading theme (18%) than the present study (4%). These differences can most likely be attributed to the wording of the open-ended questions. In the Braskamp et al. study, a question specifically elicited comments about the instructor: “What are the major strengths and weaknesses of the instructor?” Additionally, a question specifically educed comments about grading: “Comment on the grading procedures and exams.” Furthermore, it is possible the interpretation of categories was not consistent between the two studies. Note, a comparison of proportions within topical categories was not possible because there were changes in definitions that might have caused some text segments to be classified in different categories. Table 4 Crosstabulation of topical themes by delivery method Face-to-face Instructor Count (expected count) Residual (adjusted standardized residual) Course Count (expected count) Residual (adjusted standardized residual) Grading Count (expected count) Residual (adjusted standardized residual)
Online
132 (108.9) 23.1 (3.2)
122 (145.2) − 23.1 (− 3.2)
578 (604.2) − 26.2 (− 3.2)
831 (804.8) 26.2 (3.2)
37 (33.9) 3.1 (0.7)
Note. 1742 text segments submitted by 534 students were analyzed. Pearson χ2(2, N = 1742) = 11.06, p b .01, Cramér's V = .08.
42 (45.1) − 3.1 (− 0.7)
98
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
Table 5 Descriptive statistics for responses to closed-ended SET questions Dependent variable
Delivery method
M
SD
N
Instructor
Face-to-face Online Face-to-face Online
5.25 5.21 5.03 5.14
0.72 0.52 0.86 0.58
41 41 41 41
Course
Note. 534 students completed these SET instruments; 223 in face-to-face classes and 311 in online classes.
However, the Braskamp et al. study's methodology and definitions of instructional effectiveness dimensions were useful as a starting point for the present study. Although the instructor's knowledge was the same whether the course was online or face-to-face, face-to-face students mentioned a significantly greater proportion of text segments in the knowledgeable category suggesting faceto-face students considered a knowledgeable instructor more important than online students. This finding agrees with Sherry and Wilson (1997) who believe online students come to the “realization that the instructor is not the sole authority or repository of answers” (p. 68). This finding is also consistent with the constructivist theory of learning frequently used in online course design positing instructors are facilitators of student learning rather than dispensers of information (Almala, 2005; Jonassen, Davidson, Collins, Campbell, & Haag, 1995). Crosstabulation showed significantly greater proportion of praise text segments in the organization category from online students, suggesting that online students thought course organization was more important than face-to-face students. This finding is consistent with Laurillard, Stratfold, Luckin, Plowman, and Taylor (2000) and Palloff and Pratt (2001) who believe online course design should provide a clear guide through the learning activities. However, Moore (1991) cautions that too much structure (i.e., rigidity of educational objectives, teaching strategies, and evaluation methods) increases transactional (psychological) distance between the instructor and learners. Similarly, too much course structure is counter to creating an environment where learners construct their own knowledge (Jonassen et al., 1995) and in promoting autonomous, lifelong learning (Ponton, Derrick, & Wiggers, 2004). There was a significantly greater proportion of text segments in the materials category from online students. This finding suggests instructional materials were more important to online students than face-to-face students. As previously mentioned, face-to-face students thought knowledgeable instructors were more important than online students; therefore, online students may depend more on instructional materials and less on the instructor than face-toface students. These findings are consistent with the significantly high proportion of text segments in the person category by face-to-face students indicating face-to-face students considered personal characteristics more important than online students. These finding support Almala (2005), Jonassen et al. (1995), and Sherry and Wilson (1997) who assert that online education has moved toward resource-based instruction that does not emphasize the instructor as the primary source of information. 5.1. Study limitations The results of this study are limited in several aspects. Since only 62% of enrolled students submitted evaluations, data were subject to nonresponse bias. There was no external validation of instructional effectiveness available, such as student learning; therefore, only perceived student learning was reported. The university studied primarily had graduate students, so few undergraduate courses were represented in the study. Additionally, since only online learning was studied, the results may not be generalizable to other forms of distance education such as televised instruction. Finally, since only one institution was studied, the results may not be generalizable to other institutions. 5.2. Implications for practice Face-to-face students tended to consider the instructor more important than online students, and they wanted their instructor to be of good character and be knowledgeable in the content; however, for online students, the course was more important than the instructor where the course organization and instructional materials were especially important. Therefore, online course design should provide a clear guide through the learning activities (Laurillard et al., 2000; Palloff & Pratt, 2001; Priest, 2000). Also, instructors should carefully select the instructional materials because the online
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
99
instructor is generally no longer considered the repository of information (Sherry & Wilson, 1997). Students still need the instructor, but significant learning will occur through learner–content interaction and learner–learner interaction. The differences identified in importance of certain categories between online and face-to-face students underscore the need for instructors to change how they design and teach online classes. The literature identifies the change in roles required to be an effective online instructor, which may include affective, cognitive, managerial, social, and technical roles (Berge, 1995; Coppola, Hiltz, & Rotter, 2002). “Faculty cannot be expected to know intuitively how to design and deliver an effective online course” (Palloff & Pratt, 2001, p.23). Therefore, online faculty need additional training and instructional support (Howell, Saba, Lindsay, & Williams, 2004), and this support may include assistance from specialists such as instructional designers, editors, graphic designers, librarians, and technicians (Lee, 2001). Open-ended SET questions are more helpful in obtaining responses from students that address the full range of instructional effectiveness dimensions. A typical SET instrument only addresses five dimensions (e.g., Instructional Development and Effectiveness Assessment [IDEA] instrument; Hoyt & Lee, 2002) or perhaps as many as nine (e.g., Students' Evaluations of Education Quality [SEEQ] instrument; Marsh, 1982). This compares with the 20 topical categories and three appraisal categories that emerged in the present study. Additionally, the responses to open-ended SET questions contained feedback that should be helpful for improving courses and instruction. However, these openended SET questions could be more useful if they elicited responses in all three topical themes (i.e., instructor, course, and grading). 6. Concluding remarks Although considerable progress has been made, additional research is needed regarding online pedagogy. The academy needs to know the process of online learning and techniques to create optimum learning experiences for students. The present study identified similarities and differences between online and face-to-face classes and students, which adds to the body of knowledge about design and instruction of online classes. However, more theory needs to be developed and tested, specifically how course organization and instructional materials contribute to online students' learning. References Abrami, P. C., d'Apollonia, S., & Cohen, P. A. (1990). Validity of student ratings of instruction: What we know and what we do not. Journal of Educational Psychology, 82, 219−231. Allen, I. E., & Seaman, J. (2006). Making the grade: Online education in the United States. Needham, MA: The Sloan Consortium. Almala, A. H. (2005). A constructivist conceptual framework for a quality e-learning environment. Distance Learning, 2(5), 9−12. Berge, Z. L. (1995). Facilitating computer conferencing: Recommendations from the field. Educational Technology, 35(1), 22−30. Braskamp, L. A., & Ory, J. C. (1994). Assessing faculty work: Enhancing individual and instructional performance. San Francisco: Jossey-Bass. Braskamp, L. A., Ory, J. C., & Pieper, D. M. (1981). Student written comments: Dimensions of instructional quality. Journal of Educational Psychology, 73, 65−70. Bryant, F. K. (2003). Determining the attributes that contribute to satisfaction among marketing students at the university level: An analysis of the traditional/lecture method versus the internet mode of instruction. ProQuest Digital Dissertations, (UMI No. 3093426). Campbell, M. C., Floyd, J., & Sheridan, J. B. (2002). Assessment of student performance and attitudes for courses taught online versus onsite. Journal of Applied Business Research, 18(2), 45−51. Cashin, W. E. (1995). Student ratings of teaching: The research revisited (Idea Paper No. 32). Manhattan, KS: Kansas State University, Center for Faculty Evaluation and Development in Higher Education. Centra, J. A. (1993). Reflective faculty evaluation: Enhancing teaching and determining faculty effectiveness. San Francisco: Jossey-Bass. Centra, J. A., & Gaubatz, N. B. (2000). Is there gender bias in student evaluations of teaching? Journal of Higher Education, 71(1), 17−33. Cheng, H. C., Lehman, J., & Armstrong, P. (1991). Comparison of performance and attitude in traditional and computer conferencing classes. American Journal of Distance Education, 5(3), 51−64. Cohen, P. A. (1981). Student ratings of instruction and student achievement: A meta-analysis of multisection validity studies. Review of Educational Research, 51, 281−309. Coppola, N. W., Hiltz, S. R., & Rotter, N. (2002). Becoming a virtual professor: Pedagogical roles and asynchronous learning networks. Journal of Management Information Systems, 18, 169−189. Creswell, J. W. (2002). Educational research: Planning, conducting, and evaluating quantitative and qualitative research. Upper Saddle River, NJ: Merrill Prentice Hall. d'Apollonia, S., & Abrami, P. C. (1997). Navigating student ratings of instruction. American Psychologist, 52, 1198−1208. Davies, R. S., & Mendenhall, R. (1998). Evaluation comparison of online and classroom instruction for HEPE 129-fitness and lifestyle management course. Provo, UT: Department of Instructional Psychology and Technology, Brigham Young University.
100
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
Feldman, K. A. (1989). The association between student ratings of specific instructional dimensions and student achievement: Refining and extending the synthesis of data from multisection validity studies. Research in Higher Education, 30, 583−645. Feldman, K. A. (1997). Identifying exemplary teachers and teaching: Evidence from student ratings. In R. P. Perry & J. C. Smart (Eds.), Effective teaching in higher education: Research and practice (pp. 368−395). NY: Agathon Press. Graham, T. A. (2001). Teaching child development via the internet: Opportunities and pitfalls. Teaching of Psychology, 28(1), 67−71. Haladyna, T., & Hess, R. K. (1994). The detection and correction of bias in student ratings of instruction. Research in Higher Education, 35, 669−687. Hoban, G., Neu, B., & Castle, S. R. (2002, April). Assessment of student learning in an educational administration online program. Paper presented at the Annual Meeting of the American Educational Research Association, New Orleans, LA. (ERIC Document Reproduction Service No. ED427752). Howell, S. L., Saba, F., Lindsay, N. K., & Williams, P. B. (2004). Seven strategies for enabling faculty success in distance education. Internet and Higher Education, 7, 33−49. Hoyt, D. P., & Lee, E. J. (2002). Basic data for the revised IDEA system (IDEA Technical Report No. 12). Manhattan, KS: Kansas State University, The Individual Development and Educational Assessment Center. Johnson, S. D., Aragon, S. R., Shaik, N., & Palma-Rivas, N. (1999, October). Comparative analysis of online vs. face-to-face instruction. Paper presented at the WebNet 99 World Conference on the WWW and Internet Proceedings, Honolulu, HI. Jonassen, D., Davidson, M., Collins, M., Campbell, J., & Haag, B. B. (1995). Constructivism and computer-mediated communication in distance education. American Journal of Distance Education, 9(2), 7−26. Kulik, J. A. (2001). Student ratings: Validity, utility, and controversy. In M. Theall, P. C. Abrami, & L.A. Mets (Eds.), New directions for institutional research: No. 109. The student ratings debate: Are they valid? How can we best use them? (pp. 9−26). San Francisco: Jossey-Bass. Laurillard, D., Stratfold, M., Luckin, R., Plowman, I., & Taylor, J. (2000). Affordances for learning in a non-linear narrative medium. Journal of Interactive Media in Education, 2. Retrieved March 4, 2006, from. http://www.jime.open.ac.uk/00/2 Lawson, T. J. (2000). Teaching a social psychology course on the web. Teaching of Psychology, 27, 285−289. Lee, J. (2001). Instructional support for distance education and faculty motivation, commitment, satisfaction. British Journal of Educational Technology, 32(2), 153−160. Lim-Fernandes, M. (2000). Assessing the effectiveness of online education. ProQuest Digital Dissertations, (UMI No. 9996938). Marsh, H. W. (1982). Factors affecting students' evaluations of the same course taught by the same instructor on different occasions. American Educational Research Journal, 19, 485−497. Marsh, H. W. (1987). Students' evaluations of university teaching: Research findings, methodological issues, and directions for future research. International Journal of Educational Research, 11, 253−388. Marsh, H. W., & Dunkin, M. J. (1997). Students' evaluations of university teaching: A multidimensional perspective. In R. P. Perry & J. C. Smart (Eds.), Effective teaching in higher education: Research and practice (pp. 241−320). NY: Agathon Press. Marsh, H. W., & Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52, 1187−1197. Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students' evaluations of teaching: Popular myth, bias, validity, or innocent bystanders? Journal of Educational Psychology, 92, 202−228. Martin, J. R. (1998). Evaluating faculty based on student opinions: Problems, implications and recommends from Deming's theory of management perspective. Issues in Accounting Education, 13, 1079−1094. McKeachie, W. J. (1997). Student ratings: The validity of use. American Psychologist, 42, 1218−1225. McClean, M. (2004). The choice of role models by students at a culturally diverse South African medical school. Medical Teacher, 26(2), 133−141. Moore, M. (1991). Editorial: Distance education theory. American Journal of Distance Education, 5(3), 1−6. Moore, M. G., & Thompson, M. M. (1997). The effects of distance learning (American Center for the Study of Distance Education Research Monograph no. 15). University Park, PA: Pennsylvania State University, American Center for the Study of Distance Education. Mylona, Z. H. (1999). Factors affecting enrollment satisfaction and persistence in a web-based, video-based, and conventional instruction. ProQuest Digital Dissertations, (UMI No. 9933756). Navarro, P., & Shoemaker, J. (2000). Performance and perceptions of distance learners in cyberspace. American Journal of Distance Education, 14 (2), 15−35. Neuhauser, C. (2002). Learning style and effectiveness of online and face-to-face instruction. American Journal of Distance Education, 16(2), 99−113. Overall, J. U., & Marsh, H. W. (1980). Students' evaluations of instruction: A longitudinal study of their stability. Journal of Educational Psychology, 72, 321−325. Palloff, R. M., & Pratt, K. (2001). Lessons form the cyberspace classroom. San Francisco: Jossey-Bass. Patton, M. Q. (2002). Qualitative research and evaluation methods, 3rd ed. Thousand Oaks, CA: Sage Publications. Phillips, G. M., & Santoro, G. M. (1989). Teaching group discussion via computer-mediated communication. Communication Education, 38, 151−161. Poirier, C. R., & Feldman, R. S. (2004). Teaching in cyberspace: Online versus traditional instruction using a waiting-list experimental design. Teaching of Psychology, 31(1), 59−62. Ponton, M. K., Derrick, M. G., & Wiggers, N. G. (2004). Using asynchronous e-learning to develop autonomous learners. In G. Richards (Ed.), Proceedings of World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2004 (pp. 1437−1441). Chesapeake, VA: Association for the Advancement of Computing in Education. Priest, L. (2000). The story of one learner: A student's perspective on online teaching. In K. W. White & B. H. Weight (Eds.), The online teaching guide: A handbook of attitudes, strategies, and techniques for the virtual classroom (pp. 37−44). Needham Heights, MA: Allyn and Bacon.
H.F. Kelly et al. / Internet and Higher Education 10 (2007) 89–101
101
Restauri, S. L., King, F. L., & Nelson, J. G. (2001). Assessment of students' ratings for two methodologies of teaching via distance learning: An evaluative approach based on accreditation. Jacksonville, AL: Jacksonville State University. (ERIC Document Reproduction Service No. ED460148). Rivera, J. C., & Rice, M. L. (2002). A comparison of student outcomes and satisfaction between traditional and web based course offerings. Online Journal of Distance Learning Administration, 5. Retrieved September 22, 2005, from. http://www.westga.edu/~distance/ojdla/fall53/rivera53.html Rovai, A. P. (2004). A constructivist approach to online college learning. Internet and Higher Education, 7, 79−93. Rovai, A. P., Ponton, M. K., Derrick, M. G., & Davis, J. M. (2006). Student evaluation of teaching in the virtual and traditional classrooms: A comparative analysis. Internet and Higher Education, 9, 23−35. Russell, T. L. (2001). The no significant difference phenomenon: A comparative research annotated bibliography on technology for distance education. Montgomery, AL: International Distance Education Certification Center. Sherry, L., & Wilson, B. (1997). Transformative communication as a stimulus to web innovations. In B. H. Khan (Ed.), Web-based instruction (pp. 67−74). Englewood Cliffs, NJ: Educational Technology Publication. Skylar, A. A. (2004). Distance education: An exploration of alternative methods and types of instructional media in teacher education. ProQuest Digital Dissertations, (UMI No.3144530). Stocks, J. T., & Freddolino, P. P. (1998). Evaluation of a world wide web-based graduate social work research methods course. Computers in Human Services, 15(2/3), 51−69. Theall, M., & Franklin, J. (2001). Looking for bias in all the wrong places: A search for truth or a witch hunt in student ratings of instructions. In M. Theall, P. C. Abrami, & L. A. Mets (Eds.), New directions for institutional research: No. 109. The student ratings debate: Are they valid? How can we best use them? (pp. 45−56). San Francisco: Jossey-Bass. Thurmond, V. A., Wambach, K., Connors, H. R., & Frey, B. B. (2002). Evaluation of student satisfaction: Determining the impact of a web-based environment by controlling for student characteristics. American Journal of Distance Education, 16(3), 169−189. Van Schaik, P., Barker, P., & Beckstrand, S. (2003). A comparison of on-campus and online course delivery methods in southern Nevada. Innovations in Education and Teaching International, 40(1), 5−15. Wagner, R., Werner, J. M., & Schramm, R. (2002, August). An evaluation of student satisfaction with distance learning courses. Proceedings of the 18th Annual Conference on Distance Teaching and Learning (pp. 419−423). Madison: WS. Waschull, S. B. (2001). The online delivery of psychology courses: Attrition, performance, and evaluation. Teaching of Psychology, 28, 143−147.