Relational event models for social learning in MOOCs

Social Networks 43 (2015) 121–135 Contents lists available at ScienceDirect Social Networks journal homepage: www.elsevier.com/locate/socnet Relati...

Download PDF

2MB Sizes 0 Downloads 57 Views

Report

PDF Reader
Full Text

Social Networks 43 (2015) 121–135

Contents lists available at ScienceDirect

Social Networks journal homepage: www.elsevier.com/locate/socnet

Relational event models for social learning in MOOCs Duy Vu a,∗ , Philippa Pattison b , Garry Robins a a b

University of Melbourne, VIC 3010, Australia University of Sydney, NSW 2006, Australia

a r t i c l e

i n f o

Keywords: Relational event models Social learning analytics MOOCs

a b s t r a c t We propose three extensions for the relational event framework to model the co-evolution of multiple network event streams which are increasingly available thanks to the explosive growth of online applications. Firstly, a ﬂexible stratiﬁcation approach is considered to allow for more complex data structures with many types of nodes and events. Secondly, an inference method that combines nested case–control sampling with stratiﬁcation is discussed to scale the approach to very large data sets. Finally, a suite of new temporal and network statistics is introduced to reﬂect the potentially complex dependencies among events and observed heterogeneities on nodes and edges. The empirical value of new extensions is demonstrated through an analysis of social learning in Massive Open Online Courses (MOOCs). In particular, three modeling problems are considered from the network perspective: (1) the utility of social factors, performance indicators, and clickstream behaviors in the prediction of course dropout, (2) the social and temporal structure of learner interactions across discussion threads, and (3) the forms of mutual dependence of social learning interactions on prior learning success, and future learning success on forms of prior social learning interaction. © 2015 Elsevier B.V. All rights reserved.

1. Introduction The increasing growth of online social platforms such as Facebook, MOOCs, and GitHub has provided people with more opportunities to connect, learn, and collaborate. Large activity data streams generated by different parts of an application or across multiple applications allow us to explore in detail the dynamics of user interactions and behaviors, which in turn can help to design better support tools and improve their future experience. In MOOCs, for example, besides user activities on forums and assignments, clickstream data of their learning behaviors such as video interaction events can also be observed. To understand the social aspect of these online learning processes and hence improve learner engagement, it is important to model the interdependence among learners’ behaviors and interactions across these event streams. This paper will demonstrate how relational event models (Butts, 2008) can provide us with a methodological framework to achieve this goal, even with complex data sets at the Web scale. Currently, there are three statistical modeling approaches that can take account of local network structures. Each framework

∗ Corresponding author. Tel.: +61 3 8344 5550. E-mail addresses: [email protected] (D. Vu), [email protected] (P. Pattison), [email protected] (G. Robins). http://dx.doi.org/10.1016/j.socnet.2015.05.001 0378-8733/© 2015 Elsevier B.V. All rights reserved.

aims at a different network data type. First, Frank and Strauss (1986) proposed exponential random graph models (ERGMs) for cross-sectional networks where the general Markov dependence assumption allows us to model both local structures and other observed characteristics of nodes and edges (Robins et al., 2007). The inadequacies of Markov dependence and newer social circuit dependence speciﬁcations can be found in Snijders et al. (2006) while a thorough discussion on computationally intensive Markov Chain Monte Carlo inference is given by Hunter and Handcock (2006). Recent advances in ERGMs for multi-level network data of different types of edges among different sets of nodes have been discussed in Wang et al. (2013, 2013). The second modeling approach is stochastic actor-oriented models (SAOMs) (Snijders, 2001) which are suitable for panel network data. Based on the Markov process assumption, this framework models the probabilities of edge and behavior changes as functions of the current network itself and other observed characteristics of nodes and edges. Consequently, competing structural and behavioral tendencies that simultaneously drive the dynamics of network processes can be jointly estimated under SAOMs. A non-technical introduction of SAOMs and its representative applications can be found in Snijders et al. (2010). Similar to ERGMs, statistical inference for SAOMs is also carried by time-consuming MCMC procedures that simulate micro changes between discrete-time network observations (Schweinberger and Snijders, 2007).

122

D. Vu et al. / Social Networks 43 (2015) 121–135

The third modeling approach for time-stamped network data is relational event models (REMs) (Butts, 2008). Recent advances in statistical inference for REMs have been explored by Perry and Wolfe (2013) while a range of its different applications have been demonstrated in Brandes et al. (2009), Vu et al. (2011), DuBois et al. (2013), Quintane et al. (2013), Lomi et al. (2014). Compared to ERGMs and SAOMs, statistical inference of REMs is less intensive thanks to the tractability of partial likelihoods (Cox, 1972) and the sparsity of network statistic changes (Vu et al., 2011; Perry and Wolfe, 2013). Online social networks of ten thousands of nodes have been successfully analyzed using this partial likelihood inference method (Salathé et al., 2013). The most appealing feature of REMs, however, is its capability in taking full advantage of time-stamped activity data which are continuously monitored and recorded in online applications. Using ERGMs or SAOMs, and collapsing these ﬁne-grained temporal data into one cross-sectional or some discrete-time network snapshots could result in the loss of information on the dynamics of network processes. Consequently, the estimation of network effects could be severely biased. Motivated by a new problem of network analysis in social learning, this paper seeks to make four signiﬁcant contributions in relational event modeling. Firstly, based on the counting process approach in survival and event history analysis (Andersen et al., 1993), we discuss a ﬂexible stratiﬁcation method to model multi-mode and multiplex network event streams. Such data are increasingly available in online applications, but relational event models for them have not been fully considered. In online learning systems, for example, many relational event streams on user activities can be simultaneously recorded, including direct instant messages among learners, forum posts between learners and discussion threads, user assignment submission and video interaction events. These diverse collections of learner interactions and behaviors are interdependent and need to be analyzed in a joint modeling framework. For example, by modeling both post events from users to discussion threads and submission events between users and quizzes, the bi-directional relationship between learning performance and social interactions in forums can be uncovered. Our second contribution is a faster estimation procedure to address the computational challenge in the relational event framework (Butts, 2008, Section 2.3). Although recent progress has been made based on the sparsity property of count-based network statistics (Perry and Wolfe, 2013; Vu et al., 2011), our introduction of new temporal statistics requires other efﬁcient alternatives. In particular, we discuss the application of nested case–control sampling (Borgan et al., 1995) in relational event modeling and its combination with stratiﬁcation that could help to increase the sampling efﬁciency. Our third contribution is a set of advanced network statistics that can take account of observed heterogeneities on nodes and edges as well as complex relational and temporal dependencies among event streams. For example, to model the interdependence among different interaction processes in MOOCs such as forum activities and quiz submissions, we introduce a set of network statistics that are calculated from multiple event types on different sets of nodes. These statistics, for instance, allow us to test whether high-performance learners tend to engage more with each other over time in discussion threads of common interests than lowperformance ones. Another example is our application of recency statistics which are broadly used in survival analysis (Aalen et al., 2008) to test for the tendency that learners’ behavior events are clustered rather than equally distributed over time. Our last contribution is a novel application of REMs in social learning analytics (Shum and Ferguson, 2012). In order to explore the role of social structures in emergent networked learning environments, we consider three modeling problems of course dropout, quiz performance, and forum discussion. Our analysis,

for example, shows that learners with high cumulative quiz scores are more likely to engage in forum discussions. However, high activity in forums is not associated with better quiz scores though it predicts a lower likelihood of dropping out. The analysis also uncovers interesting network structures in forum discussion such as four-cycle closure effects where learners, especially those with high quiz scores, tend to maintain their knowledge exchange collaboration over time. The rest of this paper is organized as follows. In Section 2, we brieﬂy introduce the emerging ﬁeld of social learning analytics and the structure of relational event streams in MOOCs using a case-study data set from Coursera. The section concludes with an outline of main research questions that will be considered in our social learning analysis. The ﬂexible stratiﬁcation approach for multi-mode and multiplex event streams is then presented in Section 3 followed by a discussion in Section 4 on the nested case–control sampling and stratiﬁcation procedure that can scale statistical inference for REMs to large data sets. Section 5 explores a set of advanced network statistics that substantially extend the modeling capability of REMs. Our analysis of social learning in MOOCs is presented in Section 6 to demonstrate the empirical value of all proposed extensions for REMs. Section 7 will sketch out some research problems in relational event modeling that will be investigated in our future work to further promote its applications.

2. Social learning in MOOCs The ﬁeld of learning analytics in higher education is burgeoning (Siemens and Gasevic, 2012; Siemens, 2013). Interest has been fueled by the increasing use of learning management systems (LMSs) in universities and other post-secondary educational institutions and the recognition – and early demonstration – that the data recorded in these systems can be used to improve student retention and student learning (Tanes et al., 2011; Arnold and Pistilli, 2012). It has also been boosted by the very large volume of continuous-time clickstream data generated by learners in Massive Open Online Courses. MOOC data are distinctive not only because they capture a substantial proportion of students’ learning-focused interactions in online learning communities that are literally distributed around the globe, but also because of the large numbers of learners involved. While much learning analytics literature has focused on the effect of the conditions created for learning as well as individual learner behavior on successful learning, research has increasingly turned to the role of social interaction in understanding students’ learning behavior and learning success (Dawson, 2010; Haythornthwaite and Andrews, 2011). As educational theorists have long argued, learning is situated within a particular sociocultural context, and peer-to-peer interaction plays a vital role not only in building and maintaining engagement with learning activities but also in supporting forms of collaborative exchange that promote learning (Blackmore, 2010; Laurillard, 2001). Shum and Ferguson (2012) propose the term “social learning analytics” to refer to a distinctive subset of learning analytics that is socially situated. Social learning analytics draws on the substantial body of work demonstrating that new skills and ideas are not solely individual achievements, but are developed, carried forward, and passed on through interaction and collaboration. It goal is to identify behavior and patterns of interaction that promote learning and is important from both theoretical and practical perspectives: theoretically, because it explicates the important role of social interactions in creating opportunities for learning; practically, because current social interactions are a potentially powerful predictor of future social interactions and hence future learning opportunities and outcomes.

D. Vu et al. / Social Networks 43 (2015) 121–135

123

Fig. 1. The cumulative plots of learner registration and interaction events. Learners continued registering for the course even after its start date at March 31, 2013. All 6,654,585 time-stamped interaction events recorded by the Coursera system are used to construct the active status of learners. Learners have still registered and interacted with the course website after its close date on May 31, 2013 as shown in the cumulative plot of interaction events.

Efforts to understand patterns of social interaction within learning communities have included methods for visualizing and analyzing social interaction networks and have demonstrated the promise of including social learning interactions within the broader learning analytics orbit (Bakharia and Dawson, 2011; Dawson, 2010; Haythornthwaite and de Laat, 2010). However, they have often been applied to relatively small classes or to observations aggregated across time. They have therefore taken limited account of the ﬁne temporal resolution in the captured interaction data, especially for continuous-time event streams for learner interactions in LMSs and MOOCs. These existing methods are also limited in their capacity to reveal complex regularities in very large interaction networks. In addition, the relational character of social learning interaction data makes it difﬁcult to use general-purpose data mining approaches to reveal patterns involving interactions among multiple learners. As a result, new methods are needed to detect and analyze the structure of learning interactions and the consequences for learning.

2.1. Relational event streams To motivate the potential of REMs in social learning analytics, the Principles of Macroeconomics course on Coursera, one of leading MOOCs platforms, will be used as our case study. The course was offered by the University of Melbourne from March 31 to May 31, 2013. Fig. 1(a) shows the cumulative count of registered learners over time. The total number of enrolled learners is 66,286 including those joining the course after its start date. However, there are 28,542 learners who did not appear after enrollment. All time-stamped interaction events between learners and the course website recorded by the Coursera system such as forum posts and video views are used to ﬁlter out these inactive registered users. Fig. 1(b) plots the cumulative count of interaction events. Furthermore, there are 4217 learners who registered after the course close date. In this illustrative study, we will only consider the set of 33,527 learners who were actually active during the course open time. We show in detail how to incorporate these data on learner active status into relational event models in Section 3. From the network process perspective, these plots and descriptive numbers fundamentally demonstrate one key beneﬁt of the data richness in MOOCs: we can continuously keep track of those actors who are

actually participating in the network, which in turn can substantially help to reduce biases in the estimation of network effects. We further deﬁne the lifetime of a learner by two time points: enter and exit times. If a learner registers before the class commencement, her enter time is the course open time. Otherwise, her enter time is her registration time. Moreover, a learner will be considered as dropping out if she has no activity after 3 days since her last interaction with the course website. Fig. 2(a) and (b) shows the cumulative plots of learner enter and exit events, respectively. The dropout rate is very high at the beginning of the course but decreases over time. One week before the end of the course, dropouts started to pick up again perhaps explained by the fact that a large number of learners stopped using the website after achieving the pass requirement. Fig. 2(c) shows the lifetime distribution of learners who were active during the course open time. Besides conﬁrming the high dropout rates at the beginning and the end of the course, the plot reveals that many learners still interacted with the website after its close date, i.e. their lifetimes are greater than 60 days. Since our interest is on their social learning processes during the course open time, our analysis will only focus on the time window from March 31 to May 31, 2013. An important part of all MOOC platforms is their tools to facilitate interactions among learners and promote peer-based learning. In Coursera, learners may open discussion threads to ask questions and then receive answers from other students as posts and comments. Comments are forum posts on other posts. There are 958 discussion threads in the data set, of which 944 threads were opened before the course close date. Fig. 3(a) shows the cumulative plot of thread open events. Each thread is assigned into a forum. There are 14 forums including 8 lecture forums, one for each week. Table 1 shows the distribution of 944 threads created before the course close date across 14 forums. During the course open time, there are 4525 posts and 2616 comments from learners to discussion threads which result in a total of 7141 events1 . From this point, we will not distinguish between posts and comments and will use the term “post events” for both of them. Fig. 3(b) shows the cumulative plot of post events during the course open time. To promote discussion and knowledge

1 Four other posts were created before the course start date, which results in a total of 7145 post and comment events created before the course close date.

124

D. Vu et al. / Social Networks 43 (2015) 121–135

Fig. 2. The cumulative plots for enter and exit events and the life-time histogram of learners who are active during the course open time. Enter times of learners who registered before the class commencement are truncated to its start date.

Fig. 3. The cumulative plots for thread open, post and vote events during the course open time.

Table 1 The distribution of 944 discussion threads and their 7145 posts across 14 forums. Only discussion threads and posts created before the course close date are included in the table. Forum

Number of threads

Number of posts

Forums General discussion Assignments Study groups Course material feedback Technical feedback Lectures – Week 1 Lectures – Week 2 Lectures – Week 3 Lectures – Week 4 Lectures – Week 5 Lectures – Week 6 Lectures – Week 7 Lectures – Week 8

63 168 136 37 69 31 99 73 68 57 46 32 33 32

593 1087 937 475 300 1139 102 578 513 371 328 212 173 337

sharing, forum users in Coursera can also vote a post up, down, or neutral. Fig. 3(c) shows the cumulative plot of forum vote events during the course open time. Among 6719 votes, there are 5943 up votes and 657 down ones. The forum-aggregated cumulative plot of post events conceals two important issues in modeling forum post activities: forum event rates not only vary over time but are also different across forums. The cumulative plot of 7145 post events stratiﬁed by their assigned forums in Fig. 4(a) illustrates these issues. If post event rates vary over time and across forums, a forum-stratiﬁed model assuming the baseline rates of post events are different across forums should be used, in addition to the non-parametric form of

these baseline rates. In Section 3, we will discuss this stratiﬁcation approach in detail. Fig. 4(b) shows the bipartite network between learners and discussion threads in the forum of Week 3, while Fig. A.1 in the supplementary material visualizes the whole forum network. Despite its static nature, the plot helps to reveal two important network patterns. At ﬁrst, some discussion threads receive more attention from many learners, which can potentially be explained by the preferential attachment effect (Yule, 1925). The plot also suggests the existence of the clustering effect, i.e. many learners seem to discuss together across different threads. In Section 5 and Appendix B, we will introduce a suite of network statistics that can capture these complex dependencies. One advantage of the relational event framework is its ability to reserve the temporal nature of MOOC data and allow us to test such dynamic network effects in a joint fashion. In the Principles of Macroeconomics course, learners were evaluated by three graded quizzes and one peer-graded assignment. Each quiz or assignment accounts for 25% of the ﬁnal grade, and learners who achieve more than 50% of the maximum total score are awarded a statement of accomplishment. Due to the lack of reliability of the peer-grading process, our analysis will focus only on three graded quizzes. Fig. 5(a) shows the cumulative plots of submission events for these graded quizzes. Since their start dates and deadlines are different, at any time point their event rates are also different, supporting our later modeling choice that stratiﬁes the baseline rates by quizzes. There is, however, a similar submission pattern across three quizzes: their rates accelerate toward the deadlines which are nearly 9 days after the start dates. This timevarying pattern of quiz submission rates is the main reason for our non-parametric choice of their baseline rates. Moreover, since grading is done automatically, learners can submit a limited number of

D. Vu et al. / Social Networks 43 (2015) 121–135

125

Fig. 4. (a) The cumulative plots for forum post events stratiﬁed by 14 forums. (b) The bipartite network between learners and discussion threads in the Week 3 forum where edges are plotted proportionally to the numbers of post events between learners and threads.

Fig. 5. The cumulative plot of submission events and the histogram of submission scores from three graded quizzes at Weeks 3, 5, and 8.

Table 2 The distributions of learners by their numbers of submissions across three graded quizzes. Quiz

Week 3 Week 5 Week 8

Number of submissions 1

2

3

≥4

1299 764 490

2272 969 1016

57 794 11

34 24 20

solutions for each quiz. Table 2 shows the distributions of learners by their numbers of submissions for each quiz. Fig. 5(b) shows the histograms of quiz submission scores. The distributions of these scores are similar for all three quizzes. As a result of this homogeneity, we propose to use one regression model for all quiz scores rather than stratify them into three different ones. One advantage of online learning platforms is their capability in tracking all learning activities continuously. In Coursera, learners’

interactions with course materials are all recorded in millions of clickstream records. We extract three types of clickstream events that reﬂect learning behaviors including forum, wiki, and video views. In addition to discussion forums and lecture videos, wiki pages in Coursera are set up to provide users with learning documents. Fig. 6(a) and (b) shows the cumulative plots of forum and wiki view events aggregated over all learners during the course open time. These view event rates decrease toward the end of the course due to dropping out. Fig. 6(c) shows the histogram of cumulative times that learners have spent on watching lecture videos. In Section 6, we will explore the predictive potential of behavioral statistics that are constructed from these clickstream data. 2.2. Social learning research questions To demonstrate the utility of relational event models in social learning analytics, we consider three modeling problems in our case study: building dynamic predictors for dropout events,

126

D. Vu et al. / Social Networks 43 (2015) 121–135

Fig. 6. (a) The cumulative plot of forum view events. (b) The cumulative plot of wiki view events. (c) The histogram of learner video view times.

uncovering the social and temporal structure of learner interactions, and exploring the bi-directional relationship between social interactions and learning success. Firstly, the capacity to predict course dropout has been an early success of learning analytics (Tanes et al., 2011). The extended REM framework allows us to develop a suite of dynamic statistics and assess what value these statistics add to existing risk factors in the prediction of learner dropout. Particularly, we expect to see valuable predictive capacity of forum interaction and learning performance indicators on dropout probabilities. On top of these predictors, behavioral statistics based on clickstream data could also add more predictive power. Secondly, our extension in modeling of multi-mode and multiplex events also provides more powerful means for characterizing complex social and temporal regularities in interaction patterns among learners. The ﬁrst example of a meaningful interaction pattern is emerging four-cycle collaboration structures in information exchange among learners, especially among high-performance ones who have achieved good quiz scores. Another example is the role of the forum voting scheme in encouraging discussions among learners. Such hypothesis tests require the computation of network statistics that depend on many sets of nodes and event types. Our ﬁnal problem is to evaluate the forms of mutual dependence of social learning interactions on prior learning success, and future learning success on forms of prior social learning interaction. In particular, we would like to ask if learners who are highly active in forum discussions are more likely to get better quiz scores, and vice versa, if learners with high quiz scores tend to engage more with the knowledge exchange process. Findings or hypotheses of this form will ultimately be vital to effective designs and interventions for learning such as building personalized recommendation tools that can route highly informative topics contributed by highperformance learners to other users.

3. Relational event models The counting process approach developed in survival and event history analysis (Aalen et al., 2008) provides the foundation for modeling multi-mode and multiplex network event data. Without loss of generality, the network of learners, discussion threads, and assignments in a MOOC course will be used to illustrate the conceptual framework. Three different groups of counting processes will be used since there are three types of events under consideration: forum post, quiz submission, and dropout events. The counting prop cess Nij (t) is placed on the edge between learner i and discussion p

thread j to model post events. The counting process Nij (t) increases by one when learner i makes a post on thread j at time t. Similarly, q the counting process Nik (t) is placed on the edge between learner i

and quiz k to model submission events, while Nid (t) is used to model dropout events. These counting processes can be modeled by conditional p q intensity functions ij (t), ik (t), and di (t), respectively. The Cox proportional hazard form can be used to deﬁned these intensity functions: p

p

p

q

q

q

ij (t|Ht − ) = Rij (t)0f (t) exp[ sp (t, i, j)], ik (t|Ht − ) = Rik (t)0k (t) exp[ sq (t, i, k)],

(1)

di (t|Ht − ) = Ri (t)d0 (t) exp[ sd (t, i)], where Ht − is the network history right before time t; sp (t, i, j), sq (t, i, k), and sd (t, i) are vectors of time-dependent network statistics; p , , and are network coefﬁcients to estimate. Rij (t) is the “atrisk” indicator which equals to 1 if learner i can post on thread j at time t; otherwise, it is zero. In other words, the opportunity for new post events on the edge (i, j) exists only if user i is still active and thread j is already opened. A similar deﬁnition is applied for q the edge indicator Rik (t) between learner i and quiz k, and the node indicator Ri (t) for learner i. All ﬁne-grained temporal user activities such as clickstream events, thread and assignment opened times can be used to deﬁne these time-varying risk sets as discussed in p q Section 2.1. Finally, 0f (t), 0k (t), and d0 (t) are baseline intensities which should be adjusted depending on the complexity of the event structure. In the model (1), to account for time-varying event rates and simplify the statistical inference, a non-parametric form is assumed for all baseline intensities. Moreover, the quiz baseline q intensity 0k (t) is allowed to vary across individual quizzes, while p the post baseline intensity 0f (t) is assumed to be similar only for discussion threads in a forum. Given that the post event rates of threads in different forums are varied as discussed in Section 2.1, the assumption that post baseline p intensities 0f (t) are similar only for threads in the same forum f could be the most appropriate one compared to two other popular options that have been frequently used in previous applications of REMs. The ﬁrst option is an unstratiﬁed model that completely ignores the difference in the event rates across forums, i.e. assuming the same baseline intensity across all forums: p

p

p

ij (t|Ht − ) = Rij (t)0 (t) exp[ sp (t, i, j)], p

p

(2)

where 0f (t) = 0 (t) for all forums f. This assumption could lead to substantial biases in the estimation of network effects. An intuitive explanation for this issue is to consider post events in the Week 8 forum. If the unstratiﬁed model is used, the likelihood computation for these events will not only compare active threads in that forum with each other, but also match them with all inactive threads in other forums as shown in Fig. 4(a). Network structures of these

D. Vu et al. / Social Networks 43 (2015) 121–135

established but inactive threads can overshadow those of recently active ones and hence bias the estimation. The second option is to assume that post events of different forums are actually different event types. This implies that both their baseline intensities and network effects are different, i.e. assuming intensity functions of the form: p

p

p

ij (t|Ht − ) = Rij (t)0f (t) exp[ f sp (t, i, j)],

(3)

p

where both 0f (t) and f are varied across all forums f. The main issue of this intensity form is its lack of parsimony. Since we have 14 forums, this model implies 14 different sets of network effects for post events. If that network effects are actually different across all these forums, interpretation of this many parameters is difﬁcult. Moreover, the estimation of this model is not reliable since each set of network effects are estimated by their corresponding event types which are very small for some discussion forums as indicated in Table 1. In summary, our stratiﬁcation choice makes a good trade-off between the temporal nature of data and the model interpretation. In Section 4, we will also discuss another beneﬁt of stratiﬁcation that can increase the efﬁciency of sampling inference. Quiz scores can be modeled based on the separability assumption in marked point processes (Cressie, 1993) where the joint probability of submission times and scores has a factorized form. A less technical explanation for modeling event weights in REMs was discussed by Brandes et al. (2009) based on the conditional independence assumption. More speciﬁcally, since scores are ordinal categorical data, a proportional odds logistic regression model (Agresti, 2010) can be used by specifying the link function for cumulative logits of outcome: q logit[P(Yik (t|Ht − )

≤ l)] = ˛l + ˇ sg (t, i, k),

(4)

q where Yik (t|Ht − ) is the score corresponding to the submission event q Nik (t), ˛ = (˛1 , . . ., ˛10 ) is the baseline intercept corresponding to

score level l, and ˇ are network coefﬁcients to estimate. The vector of time-dependent statistics sg (t, i, k) for quiz scores can be speciﬁed differently from the vector of statistics sq (t, i, k) for quiz event times and can include information about the current event. This conceptual framework based on counting processes in general can be applied to more complex networks with many sets of nodes and event types. The main idea is to place counting processes on network units where events occur and then specify dependencies across these counting processes through conditional intensity functions and network statistics. This modeling approach can be summarized in four steps:

1. The ﬁrst step is to specify network units where count processes are placed. For example, the counting process can be placed on nodes for egocentric events (Vu et al., 2011; Salathé et al., 2013) and edges between the same set of nodes or different sets of nodes for relational events (Vu et al., 2011; Perry and Wolfe, 2013). It is also possible to consider more complex cases where network units are composed of multiple nodes and edges. For example, a future recommendation feature in MOOC forums can allow for an action event when one learner forwards to another learner an important discussion thread. 2. The second step is to deﬁne the risk sets for network units of interest, i.e. specify a time interval or a composite of nonoverlapping time intervals when events can occur. 3. The third step is to decide how baseline intensities should be stratiﬁed depending on the nature of relational event data. For different event types, it is usual to assume that their baseline rates are varied. Moreover, if event rates across different subsets of an event type are different, the baseline intensity could be further stratiﬁed, i.e. allowed to be varied across these subtypes. If necessary, network effects could also be allowed to

127

vary across these event types or subtypes. In Section 4, we will discuss the estimation procedure which is directly affected by these stratiﬁcation choices. 4. Finally, guided by substantive theories, a set of network statistics that are expected to describe the complex dependencies across multiple event types should be speciﬁed. In Section 5, some examples of advanced network statistics that allow us to explore the social learning aspect in MOOCs will be deﬁned. They serve as illustrative examples that demonstrate the ﬂexibility of REMs in modeling dynamic network structures. The stratiﬁcation approach discussed in our paper is very similar to the multi-level approach for discrete-time event network data (de Nooy, 2011), which is also motivated by the potential application of survival and event history analysis to network modeling. Their major difference is in the time scales, i.e. continuous-time events versus discrete-time events, which results in different forms of intensity functions (Kalbﬂeisch and Prentice, 2002). Another difference is that baseline hazard functions are explicitly estimated in discrete-time models (de Nooy, 2011). For our continuous-time models, these baseline hazard functions are nuisance parameters and can only be estimated indirectly using Breslow’s estimator after the partial likelihood maximization step (Cox, 1972). This is an appropriate choice for the scale of network event streams under our consideration. 4. Statistical inference Thanks to the non-parametric form and stratiﬁcation assumption of baseline intensities as well as the separability assumption between event times and weights, parameters , , , ˛, and ˇ can be estimated by maximizing the partial likelihood PL(, , , ˛, ˇ) which is the product of individual likelihoods and partial likelihoods (Andersen et al., 1993): PL(, , , ˛, ˇ) = PL() × PL() × PL() × L(˛, ˇ),

(5)

where PL(), PL(), and PL() are partial likelihoods for the models of forum post, quiz submission and dropout events, respectively. L(˛, ˇ) is the likelihood of the proportional odds model for quiz scores. The above factorization property allows us to estimate the sets of network effects separately. While the cumulative logit parameters for quiz scores can be estimated quickly using some standard statistical software packages, the partial likelihood inference for the models of forum post, quiz submission and dropout events is computationally intensive that can only be tackled through some sampling approaches. Without loss of generality, we focus on the estimation of to discuss how stratiﬁcation alters the form of partial likelihoods and nested case–control sampling. 4.1. Partial likelihoods For the forum-stratiﬁed model (1), parameters of post events can be estimated by maximizing the partial likelihood (Andersen et al., 1993): PL() =

e∈E

n i=1

exp[ s(te , ie , je )]

R (t ) exp[ j∈forum(je ) ij e

s(te , i, j)]

,

(6)

where E is the set of all post events over the observation time, n is the total number of learners, and the function forum(je ) maps thread je to the thread set of its forum. Therefore, the denominator in (6) is summed over all active learners and only opened threads in the forum of the current event.

128

D. Vu et al. / Social Networks 43 (2015) 121–135

where Ek is the set of all post events within forum k over the observation time. This separable form allows us to employ parallel computation and when combining with the nested case–control sampling discussed in the next section could speed up inference on large data sets. The main disadvantages of this approach are its lack of parsimony due to a large number of coefﬁcients to interpret and efﬁciency if the number of events in each forum is small as discussed in Section 3.

˜ e ) includes the case and only the sampled controls at the where R(t event e. In epidemiology studies, commonly one to four controls are selected for each case. However, due to the skewed distributions of network statistics, we recommend to use a larger control size as long as generated data sets can be stored in computer memory. When the number of events is large, it is also possible to sample over the event set E (Langholz and Borgan, 1995). In short, sampling allows us to trade computational time for higher memory usage and larger variance of parameter estimates. Statistical software for conditional logistic regression models such as functions clogit in R and PHREG in SAS can be used for parameter estimation under these sampled partial likelihoods. When combining with stratiﬁcation, this simple sampling approach can be made more efﬁcient. This possibility can be illustrated by comparing the sampling ratio, i.e. the control size over the current risk set size, of the forum-stratiﬁed model (1) with one of the non-stratiﬁed model (2). Given a similar sampling budget, i.e. the control size, the sampling ratio is much larger for the forumstratiﬁed model. Its risk set is smaller since each of its cases is only required to compare with potential controls in the same forum or strata. Consequently, when the sampling budget is limited, a general approach is to look for a set of nuisance statistics, i.e. their effects are not of primary interest, and then use them to stratify the risk set R(t). Consistency and asymptotic results of nested case–control sampling estimators have been considered by Borgan et al. (1995). More complex sampling designs are also discussed in their study, though the required programming efforts make them less appealing in the context of network data.

4.2. Sampling-based inference

5. Network statistics

To discuss the sampling-based inference approach, we will consider a general form of partial likelihoods:

To take account of observed heterogeneities on nodes and edges as well as complex relational and temporal dependencies among event streams, we explore a suite of advanced network statistics which can be divided further into three subsets: dynamic exogenous covariates, temporal statistics, and multi-mode and multiplex network statistics. A complete deﬁnition of all network statistics in our case study is presented in Appendix B while only representative examples of them are selected for this section to demonstrate the ﬂexibility of REMs in modeling dynamic network structures. In general, network statistics should be customized speciﬁcally for each REM application depending on the network event structure, and thus the future development of statistical software for REMs should provide researchers with suitable tools to deﬁne their own network statistics similar to the template approach for ERGMs by Hunter et al. (2013).

The partial likelihood for the unstratiﬁed model (2), on the other hand, has a simpler form: PL() =

exp[ s(te , ie , je )]

n m i=1

e∈E

p R (t ) exp[ s(te , i, j)] j=1 ij e

,

(7)

where m is the total number of threads. Compared to the partial likelihood (6) of the forum-stratiﬁed model, however, the partial likelihood (7) is more computationally intensive. Its denominator is summed over all opened threads in all forums. This computational advantage of stratiﬁcation is ampliﬁed when combined with the nested case–control sampling approach as discussed in Section 4.2. Finally, the forum-stratiﬁed model with forum-varying effects (3) yields a more computationally efﬁcient partial likelihood. Each forum-speciﬁc parameters k can be estimated by maximizing the partial likelihood of the form: PL( k ) =

i=1

e∈Ek

PL() =

e∈E

exp[ s(te , ie , je )]

n

R (t ) exp[ j in k ij e

s(te , i, j)]

(8)

exp[ s(te , ie , je )]

(i,j)∈R(te )

exp[ s(te , i, j)]

,

(9)

where R(te ) is the corresponding risk set for the event e at time te . In the forum-stratiﬁed model (1), for example, R(te ) includes only edges between active learners and opened threads in the forum of the current event. On the other hand, in the unstratiﬁed model (2), R(te ) includes all potential edges between active learners and opened threads. For relational event data sets from online applications, the size of risk sets could be very large. For example, if the number of active learners is 30,000 and the number of opened threads is 1000, the risk set size of the unstratiﬁed model (2) could be up to 30,000,000. The computation becomes quickly infeasible since the numbers of events in these data sets are also very large. Although sparsity can be exploited to reduce the amount of computation as proposed by Vu et al. (2011) and Perry and Wolfe (2013), the introduction of temporal network statistics as discussed in Section 5 requires other efﬁcient approaches. To make the partial likelihood inference feasible in these cases, we argue that the best approach is to use sampling. More speciﬁcally, our proposed approach is a combination of the simple nested case–control sampling and stratiﬁcation (Borgan et al., 1995). Under the nested case–control sampling approach, for each event or case we randomly select a subset of controls from the current risk set R(t) to compute the denominator sum in the partial likelihood (9). This results in the sampled partial likelihood of the form: PL() =

e∈E

exp[ s(te , ie , je )]

˜ e) (i,j)∈R(t

exp[ s(te , i, j)]

.

(10)

5.1. Dynamic exogenous covariates Our ﬁrst set of network statistics is an explicit introduction of dynamic exogenous covariates in relation event modeling. They are different from dynamic endogenous statistics in Butts (2008) since they can be continuously updated by covariate events that are external to the main event outcome. Besides static endogenous covariates, these two types of dynamic statistics provide REMs with rich modeling options to express the temporal effects of local network structures and the surrounding context on the event process. User quiz scores: The ﬁrst example of dynamic exogenous covariates is the user quiz scores covariate which can be used to model the effects of learners’ quiz performance on their forum activities. In this modeling problem, submission events of three graded quizzes are external to forum post events. For each quiz, learners can submit multiple times, and since submission deadlines are overlapping, it is possible that learners can submit their second quiz before resubmitting their ﬁrst one (see Fig. 5(a)). Due to this complexity, we

D. Vu et al. / Social Networks 43 (2015) 121–135

129

Fig. 7. Counting processes of a sample of individual learners across three event types show the recency effect. Event times or jump points tend to be clustered in groups rather than equally distributed over time.

deﬁne learner quiz performance by a dynamic covariate that is equal to the cumulative score of three graded quizzes, where the current best score of each quiz is used in the summation. Speciﬁcally, the user quiz scores covariate for learner i at time t is deﬁned as follows: s(t, i) =

max gq1 +

q1 ∈Q1 (t,i)

max gq2 +

q2 ∈Q2 (t,i)

max gq3 ,

q3 ∈Q3 (t,i)

(11)

where Q1 (t, i), Q2 (t, i), and Q3 (t, i) are the sets of submissions by time t from learner i for graded quizzes 1, 2, and 3, respectively, and gq is the grade of the submission q. The default score for a quiz without submission is zero. User video view time: In the above example, while quiz submission events are external to forum post events, we actually model them as discussed in Section 3. Our second example demonstrates another possibility where dynamic exogenous covariates are constructed from external events that are not of interest as outcomes. To model the effects of clickstream behaviors on course dropout, forum post, and quiz submission events, we deﬁne user video view time predictor from clickstream data as follows: s(t, i) =

ve ,

(12)

User post recency: This statistic is deﬁned as the gap times between the current time and learners’ last post times to model recency in forum post activities. Particularly, the user post recency covariate for learner i at time t is deﬁned as follows: s(t, i) = t − max te

(13)

p i.

e∈N (t)

p

p

where Ni. (t) = j Nij (t) is the union of all post events from learner i by time t. A negative coefﬁcient for this statistic provides empirical evidence for the recency effect in forum post activities, i.e. there is an increase in the post event likelihood following a recent one or learner post events tend to be clustered into groups rather equally distributed over time. User forum view: This statistic is deﬁned as the total number of forum views that a learner has made. To account for the temporal relevance in clickstream data, events are also weighted by their timestamps as follows: s(t, i) =

w(t, te ),

(14)

e∈Eforum (t,i)

e∈Evideo (t,i)

where Evideo (t, i) is the set of video view events recorded on learner i by time t, and ve is the amount of time that learner i has spent on watching the video during interaction event e. Clickstream events, in this case, are not main outcomes and hence are external to all events of interest. 5.2. Temporal statistics Our second set of network statistics demonstrates the capability of using event timestamps to address two temporal aspects of network event processes. The ﬁrst observation is that interaction events tend to be clustered in groups rather than equally distributed over time. Fig. 7, which plots counting processes of a sample of learners across three different event types reveals clusters of behavioral events. This phenomenon can also be interpreted as the recency effect or an increase in the event likelihood following a recent one (Aalen et al., 2004). The second temporal aspect that should be taken into account is the differential importance of new events versus those far in the past. To account for this problem, prior approaches (Vu et al., 2011; Perry and Wolfe, 2013; Quintane et al., 2013) separate network statistics into long-term and shortterm indicators, which could result in less parsimonious models. To achieve parsimony, we apply time-weighting methods (Vu et al., 2014) in the computation of count-based network statistics, especially for those extracted from highly volatile clickstream data.

where Eforum (t, i) is the set of forum view events recorded on learner i by time t and te is the time of view event e. The function w(t, te ) deﬁnes the temporal weight for the event e: w(t, te ) = I[(t − te ) ≤ 1.0] +

1 × I[(t − te ) > 1.0], t − te

(15)

where the time unit is in days, and I(x) is an indicator function that equals to 1 if the statement x is true or 0, otherwise. This function imposes a weighting scheme that events within the last day are equally weighted as 1.0 while the weights of events further in the past are decayed over time.

5.3. Multi-mode and multiplex network statistics Our third set of network statistics is proposed to account for complex dependencies in multi-mode and multiplex event structures. We ﬁrst discuss a three-path statistic and its interaction with another nodal statistic that can be used to test for four-cycle closure effects in bipartite event networks. We conclude our discussion on network statistics by considering a vote statistic that are constructed from two different event types on two sets of nodes. Three-paths: This statistic is equal to the number of three paths from learner i to discussion thread j and can be formally deﬁned as follows:

130

D. Vu et al. / Social Networks 43 (2015) 121–135

Fig. 8. Three examples of multi-mode and multiplex network statistics. Rectangles and circles denotes learners and discussion threads, respectively. Solid edges are past events while dash lines stand for edges being modeled. Arrows represent votes from learners to forum posts contributed by others. Shaded rectangles indicate high-performance learners while plus and minus signs denote up and down votes, respectively.

s(t, i, j) =

I[Nil (t − ) > 0] ×

l= / j

I[Nkj (t − ) > 0] × I[Nkl (t − ) > 0].

k= / i

(16) In this formula, k and l are used to index learners and threads, respectively. A positive coefﬁcient of this three-paths statistic is expected and can be interpreted as that learners sharing knowledge in the past tend to contribute to the same discussion threads in the future, i.e. closing their current three-paths as shown in Fig. 8(a). In other words, it allows us to test if learners are more likely to maintain their knowledge sharing collaborations over time. Three-paths and quiz scores: To differentiate knowledge sharing patterns between low and high-performance learners, we also consider an interaction statistic between the number of three paths and quiz scores of learners on these paths. Formally, the three-paths and quiz scores interaction for learner i to thread j at time t is deﬁned as follows:

s(t, i, j) = q(t, i) ×

l= / j

I[Nil (t − ) > 0] ×

q(t, k)

k= / i

× I[Nkj (t − ) > 0] × I[Nkl (t − ) > 0],

(17)

where q(t, i) is the cumulative quiz scores of learner i by time t as deﬁned in Eq. (11). A positive coefﬁcient of this statistic implies the tendency that knowledge sharing collaboration is stronger among high-performance learners than low-performance ones as illustrated in Fig. 8(b) and (a), respectively. User forum votes: Since forum users can vote up or down on posts made by other users, the user forum votes covariate is deﬁned to test how this voting scheme encourages more discussion contribution from learners. This statistic measures the popularity of forum contributions from a learner i, and is deﬁned as the difference between up votes and down votes on all of her posts so far:

s(t, i) =

(t,i,j) m V

rv ,

j=1

(18)

v=1

where V(t, i, j) is the set of votes that learner i has received on thread j, and rv is the rating of vote v which can take value +1, 0, or −1 for up, neutral, or down votes, respectively. For example, learner L1 in Fig. 8(c) receives two up votes for her contribution in thread T1 while in thread T2 she receives a mixed review, one up vote and one down vote. Therefore, her user forum votes statistic is deﬁned as 2. As illustrated in the ﬁgure, the computation of this statistic involves both post and vote events between two sets of learners and threads.

Table 3 Summary statistics of three event types that are used in the analysis. All events occurred during the class open time are used in the estimation. Risk sets are changed over time depending on the number of active learners and the number of opened discussion threads. Summary statistics

Drop-out

Forum post

Quiz submission

Number of events Mean risk set size Max risk set size Min risk set size

28,263 15,626 24,497 5465

7141 726,283 1,807,920 9478

14,140 13,220 19,113 5644

6. Results The model for dropout, forum post, and quiz submission events deﬁned by the set of Eqs. (1) and (4) is ﬁt to all events observed during the course open time from March 31 to May 31, 2013. Summary statistics of these events are listed in Table 3. Due to the large size of risk sets, the nested case–control sampling with 10,000 controls for each case is used for all event outcomes. If the control size is larger than the current risk set size, all network units at risk are included in the sample. A Java package was implemented to generate nested case–control data from MOOC event streams, which are then fed into the Cox proportional hazard procedure PHREG in SAS for the estimation of network effects. In the estimation of the proportional odd model for quiz scores, sampling is not necessary and all data are ﬁtted using the procedure polr in R. Table 9 lists all network statistics in our analysis. While only descriptive deﬁnitions of these network statistics are provided, their mathematical formulas are given in Appendix B. All estimated models in the paper are obtained after covariates and network statistics are standardized, i.e. means and variances of transformed variables are equal to 0 and 1, respectively. Such a transformation step allows us compare the relative strength of different network and covariate effects. Moreover, depending on event outcomes, different subsets of these network statistics are considered. For example, the set of three-path statistics is only relevant to the model of forum post events, while the cohort statistic for measuring user active time is included in all models. In our analysis, statistical signiﬁcance is considered at the conﬁdence level 0.01. Before going to the speciﬁcation and interpretation of estimated models for three event outcomes, we will evaluate the predictive potential of behavioral clickstream statistics. They are expected to capture the important dynamics of interactions between learners and their learning materials. Table 4 shows AICs of three event models with and without clickstream statistics including user forum view, user wiki view, user video view time, thread view, and edge view. The last two statistics are only relevant for the model of post events. Consistently across all models, behavioral clickstream predictors help to improve the model ﬁt, and all likelihood ratio tests comparing two nested versions are statistically signiﬁcant. Though

D. Vu et al. / Social Networks 43 (2015) 121–135 Table 4 AICs of four regression models using and without using clickstream statistics. Models with clickstream statistics have better ﬁts since their AICs are smaller. The P-values of likelihood ratio tests comparing models with and without using clickstream statistics are all smaller than 0.01. Clickstream No Yes

Drop-out events 513,405 512,243

Forum posts 72,265 70,352

Quiz events 234,211 232,399

Quiz scores 57,287 56,959

Table 5 Coefﬁcient estimates for network effects in the dropout model.

User active time User quiz scores User pass achievement User forum view User wiki view User video view time User questions User degree User activity User degree × activity User forum votes

Coefﬁcient

Standard error

P-value

−0.50165 −0.10145 0.09705 −0.50260 −0.19812 −0.05492 −0.03102 0.21670 −0.16401 −0.01045 0.00231

0.00883 0.01255 0.01026 0.03612 0.01072 0.01019 0.01522 0.02768 0.04769 0.00248 0.00582

<0.0001 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.0415 <0.0001 0.0006 <0.0001 0.6916

this analysis has showed the predictive potential of clickstream data, there are still many venues to develop more ﬁne-grained clickstream statistics such as those that can capture the sequential orders of events (Mannila et al., 1997). 6.1. Predictors of dropout events Four groups of network statistics are considered in the model for dropout events. The ﬁrst three statistics are user active time, user quiz scores, and user pass achievement which are used to control for the effects of learners’ course tenure, quiz performance, and accomplishment status on their dropout decisions. Besides these cohort and performance indicators which are broadly used in learning analytics, we also consider two sets of behavioral and social statistics, i.e. clickstream and user forum statistics. The resulted model in Table 5 is estimated using 28,263 dropout events observed during the course open time. The negative coefﬁcient of user active time implies an interesting pattern that learners taking the class since the start date are less likely to drop out than those joining it later. Quiz performance is also negatively correlated with the likelihood of dropout events, i.e. learners with high quiz scores are less likely to drop out. The positive effect of user pass achievement indicator, however, reveals an intriguing pattern that learners who already achieve the passing score are more likely to drop out. This indicator, therefore, helps to distinguish genuine dropout events from those related to accomplished learners. Learner interactions with course materials recorded in clickstream data are also associated with the likelihood of dropout events. Learners who spend more time on watching video lectures, viewing wiki or forum pages are less likely to drop out. The relation between forum activities and the likelihood of dropping out, on the other hand, is more complex. The positive coefﬁcient of user activity implies that learners who actively contributed to forum discussion, i.e. having a high number of posts, are less likely to drop out. However, those whose user degree are high, i.e. discussion activities are dispersed across multiple threads, are more likely to drop out. We speculate that learners who took the course seriously tend to focus their knowledge exchanges intensively on a small number of topics, i.e. having high activity but low degree. Less motivated learners, on the other hand, contributed discussions sporadically across many threads and did not follow

131

Table 6 Coefﬁcient estimates for network effects in the forum-stratiﬁed model. Network effect

Coefﬁcient

Standard error

P-value

User questions User degree User activity User degree × activity Thread view Thread degree Thread activity Thread degree × activity Thread degree and quiz scores Degree assortativity Activity assortativity Edge view Edge activity Three-paths Three-paths and quiz scores Edge activity × three-paths Edge activity × three-paths and quiz scores User two-paths User two-paths and quiz scores Thread two-paths Thread two-paths and quiz scores User post recency Thread age User forum votes Thread forum votes Edge forum votes Forum vote assortativity User active time User forum view User wiki view User video view time User quiz scores User pass achievement

0.00854 0.06497 0.03234 −0.00125 −0.00879 0.39775 0.55505 −0.07150 0.11334 −0.00742 −0.01181 0.00106 0.06807 0.01061 0.00970 −0.00042 0.00001

0.00297 0.00477 0.00293 0.00005 0.02068 0.01723 0.02176 0.00351 0.02384 0.00261 0.00156 0.00058 0.00075 0.00116 0.00131 0.00002 0.00001

0.0040 <0.0001 <0.0001 <0.0001 0.6709 <0.0001 <0.0001 <0.0001 <0.0001 0.0045 <0.0001 0.0677 <0.0001 <0.0001 <0.0001 <0.0001 0.6313

0.08114 −0.09779 0.03763 −0.24693 −2.95812 −2.43842 −0.08822 0.29511 0.00187 −0.00115 1.38664 0.06158 0.10600 0.13009 0.15695 −0.00341

0.00578 0.00551 0.02589 0.03985 0.05561 0.05563 0.00800 0.01868 0.00200 0.00332 0.06819 0.00268 0.00247 0.00995 0.02289 0.00985

<0.0001 <0.0001 0.1461 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.3493 0.7289 <0.0001 <0.0001 <0.0001 <0.0001 <0.0001 0.7293

them through, i.e. having high degree but low activity. The negative interaction effect between these two statistics also supports this speculation. Finally, the numbers of questions and favorite votes that learners posted and received, respectively, are not correlated with the likelihood of dropout events, though they were still included in the model to control for such potential effects. 6.2. Local network structures in forums To learn about collaborative structures among learners across discussion threads, all covariates and network statistics in Table 9 are included in the model of post events, except user quiz recency and user current quiz score which are not relevant for post events. Two aggregate covariates of quiz performance, i.e. user quiz scores and user pass achievement, are also considered to study the role of learning performance on social interactions. The reverse direction of this relationship between learning performance and social interactions is studied in the model of submission events discussed in Section 6.3. Table 6 lists the resulted model which was estimated using 7141 forum post events occurred during the course open time. The model successfully uncovers many interesting collaboration patterns in the knowledge exchange process among learners. First, learners’ information seeking activities in the forum measured by user questions are positively correlated with the likelihood of their further post events. Moreover, the positive effects of both user degree and user activity imply that not only the breadth but also the intensity of forum contributions are associated with more activities in the future. The negative interaction between them, however, indicates their non-linear relationship with the post event outcome. High values of learners’ activity decrease the effect of their degrees, and vice versa.

132

D. Vu et al. / Social Networks 43 (2015) 121–135

Secondly, the effects of thread popularity on attracting more post events vary across different measurements. While thread popularity measured by the breadth and intensity of previous posts, i.e. thread degree and thread activity, are positively linked to the likelihood of more post events, thread popularity measured by view events, i.e. thread view, is not statistically signiﬁcant. Moreover, high quiz performance of contributors of a thread measured by thread degree and quiz scores also predicts a higher likelihood of its future post events. In other words, threads are more attractive if they extract contributions from high-performance learners. Thirdly, there is empirical evidence of the anti rich-club effects in terms of both node degrees and activities. Both assortativity effects, degree assortativity and activity assortativity, are negative which means that highly active learners are less likely to participate in popular threads. Fourthly, three main edge statistics, edge activity, three-paths, and three-paths and quiz scores, are positive and statistically signiﬁcant. These mean that learners tend to keep posting on threads that they have already contributed, and learners with common interests in the past tend to take part in the same future discussions. In other words, they tend to maintain their knowledge exchange collaborations over time. Moreover, this collaborative pattern is even stronger among high-performance learners as indicated by the positive effect of the interaction between three-paths and learners’ quiz scores. The edge view effect, however, is not statistically signiﬁcant though positive, which implies that learners’ view and post activities are not strongly correlated. Finally, while being included as control factors for four-cycle closure effects, the effects of four two-paths statistics might also have meaningful interpretations. For example, while post events are more likely to be initiated by learners engaging with popular threads, i.e. user two-paths are high, their likelihoods decrease if learners on these two-paths have good quiz scores. In other words, high-performance learners who have exchanged knowledge with other high-performance ones tend to be more selective in contributing new posts. Regarding temporal effects, the negative coefﬁcient of user post recency implies the recency effect in post activities, i.e. there is an increase in the post event likelihood following a recent one. Another interpretation for this recency effect is that learner post events are clustered into groups rather equally distributed over time. Similarly, the negative coefﬁcient of thread age can be interpreted as that recently opened threads attract more attentions than those created further in the past. The voting scheme, in general, is only effective in attracting more discussions on highly rated threads rather than in encouraging most-favored learners to continue their forum contributions. More speciﬁcally, the positive coefﬁcient of thread forum votes means highly rated threads tend to receive more new posts than less favored ones. The negative effect of user forum votes, on the other hand, indicates that the voting scheme does not provide learners with incentive to take part in future discussions. Finally, favorite votes on posts that learners have contributed to a thread do not encourage them to post more on it, and there is no evidence of the rich-club effect between learners and discussion threads in terms of forum votes they received. Both edge forum votes and forum vote assortativity effects are not statistically signiﬁcant. All four cohort and clickstream effects are statistically signiﬁcant. Their positive effects can be interpreted as that learners who took the class since the beginning or spent more time to interact with learning materials including discussion threads, wiki pages, and lecture videos, are more likely to contribute to the knowledge exchange process. Similarly, quiz performance is also positively correlated with post events, i.e. the higher cumulative quiz score a learner has, the higher likelihood that she will post to the forum. The course accomplishment status, however, has no effect on their

Table 7 Coefﬁcient estimates in the quiz model for submission events.

User active time User forum view User wiki view User video view time User questions User degree User activity User degree × activity User forum votes User quiz recency User quiz scores User pass achievement User current quiz score

Coefﬁcient

Standard error

P-value

0.88572 0.03213 0.14967 0.08675 0.00196 −0.01784 0.01355 −0.00008 −0.02416 −2.09012 0.59945 −0.16061 −0.78796

0.02336 0.00498 0.00327 0.00555 0.00644 0.01118 0.00711 0.00021 0.01856 0.02350 0.02099 0.00651 0.01353

<0.0001 <0.0001 <0.0001 <0.0001 0.7614 0.1107 0.0566 0.7147 0.1931 <0.0001 <0.0001 <0.0001 <0.0001

Table 8 Coefﬁcient estimates of main effects in the quiz model for submission scores.

User active time User forum view User wiki view User video view time User questions User degree User activity User degree × activity User forum votes User quiz recency User quiz scores User pass achievement User current quiz score

Coefﬁcient

Standard error

P-value

−0.04031 0.02943 0.02590 0.75109 0.01854 −0.02601 0.00847 0.00038 0.00772 0.04407 0.09386 −0.28058 0.31193

0.00228 0.00400 0.00324 0.07379 0.03162 0.01181 0.02956 0.00063 0.00569 0.00256 0.00513 0.06831 0.00855

<0.0001 <0.0001 <0.0001 <0.0001 0.55758 0.02759 0.77442 0.54788 0.17458 <0.0001 <0.0001 <0.0001 <0.0001

forum activities. While these two statistics help to answer the question if quiz performance promotes social interactions, the effects of prior social interactions on future quiz performance is explored in the next section. 6.3. The mutual dependence between forum activities and quiz performance The model of quiz submissions includes two submodels of events and scores. The same set of network statistics is used in both models. It contains four groups of network statistics similar to the dropout model plus two additional statistics that are only relevant to submission events: user quiz recency and user current quiz score. Besides answering the question of how social interactions in forums are related to learning performance, this model also helps to investigate the problem of how prior learning performance predicts future learning performance. Tables 7 and 8 show the resulted models for submission events and scores, respectively, which were estimated using 14,140 events recorded during the course open time. Estimated baseline intercepts in the model of submission scores are shown in Table C.1. We will interpret network effects on both submission events and scores in Tables 7 and 8 simultaneously to highlight their similarities and differences. Firstly, learners who took the class since the beginning tend to make more submissions, but obtain lower scores than those joining the class later. Secondly, while learners’ clickstream behaviors, i.e. their interactions with learning materials, are positively correlated with submission events and scores, their forum activities have no effect on both outcomes. Therefore, the predictive relationship between learning performance and social interactions might not be bi-directional as expected. While high learning performance is positively correlated with active social interactions, the other direction of the predictive relationship is not supported by our current analysis.

D. Vu et al. / Social Networks 43 (2015) 121–135 Table 9 Descriptive deﬁnitions of all network statistics that are considered in our case study. Their mathematical formulas are given in Appendix B. Network statistic

Descriptive deﬁnitions Forum user statistics The current number of questions that a learner has posted to the forum which measures her information seeking activity in the forum. The current number of threads to which a learner has posted which measures the breadth of her forum contributions. The current number of posts that a learner has made which measures the intensity of her forum contributions. Events are weighted by their timestamps using the function (15). The interaction between user degree and user activity which measures how the breadth of a learner’s forum posts changes the effect of her contribution intensity, and vice versa.

User questions

User degree

User activity

User degree × activity

Forum thread statistics The current number of view events on a thread weighted by their timestamps which is one of measurements for thread popularity. The current number of learners who have Thread degree posted on a thread which is another measurement for thread popularity. Thread activity The time-weighted number of posts that a thread has received which measures the intensity of its popularity. The interaction between thread degree and Thread degree × activity thread activity statistics. Thread degree and quiz scores This interaction between thread degree and user quiz scores to differentiate the popularity of a thread further based on quiz performance of learners who have contributed to it. Thread view

Table 9 (Continued) Network statistic

Descriptive deﬁnitions

Edge activity × three-paths and quiz scores

An interaction between edge activity and three-paths and quiz scores to control for the decrease of three-paths and quiz scores effect under the presence of previous posts.

User post recency

Thread age

User forum votes

Thread forum votes

Forum vote assortativity

Edge forum votes

User active time

User forum view User wiki view User video view time

Degree assortativity

Activity assortativity

Forum assortativity statistics The interaction between user degree and thread degree to test the assortativity in terms of node degrees. The interaction between user activity and thread activity to test the assortativity in terms of node activities.

User quiz recency

User quiz scores Forum two-path statistics The number of two paths from a learner User two-paths that measures the popularity of discussion threads that she has engaged. User two-paths and quiz scores An interaction between user two-paths and quiz scores of learners on these paths. The number of two paths from a thread Thread two-paths that measures the breadth of forum contributions of learners who have engaged with it. Thread two-paths and quiz scores An interaction between thread two-paths and quiz scores of learners on these paths. Edge view Edge activity Three-paths

Three-paths and quiz scores

Edge activity × three-paths

Forum edge statistics The time-weighted cumulative number of view events from a learner to a thread. The time-weighted cumulative number of post events from a learner to a thread. The number of three paths from a learner to a thread that is used to test if learners tend to maintain their knowledge sharing collaborations over time. An interaction between three-paths and quiz scores of learners on these paths to differentiate collaboration levels between low and high-performance learners. An interaction between edge activity and three-paths to control for the decrease of three-paths effect under the presence of previous posts.

133

User pass achievement

User current quiz score

Forum temporal statistics The gap time between the current time and the last post time to model the recency effect or the clustering of forum post activities. The gap time between the current time and the opened time of a thread. A negative coefﬁcient implies the aging effect of discussion threads. Forum vote statistics The cumulative number of up votes subtracted by the cumulative of down votes for a learner. The cumulative number of up votes subtracted by the cumulative of down votes on a thread. An interaction between user forum votes and thread forum votes to test the assortativity in terms of forum votes. The cumulative number of up votes subtracted by the cumulative of down votes on posts between a learner and a thread. Cohort statistics The gap between the current time and the enter time of a learner which is used to control for the cohort effect. Clickstream statistics The number of times a learner has viewed discussion threads in the forum. The number of times a learner has viewed wiki pages. The total time that a learner has spent on watching video lectures. Quiz statistics The gap time between the current time and the last quiz submission time which is included to model the recency effect or the clustering of submission events over time. The cumulative score of three graded quizzes that a learner has received. An indicator that is equal to 1 if user quiz scores is greater than 20. It is used to control for the scenario that learners cease the class after achieving the assessment threshold. The best score on a quiz that a learner has received.

Finally, all estimated effects of quiz statistics are statistically signiﬁcant on both outcome models. Similar to post events, submission events are also clustered over time, i.e. the negative effect of user quiz recency in the event model. The positive effect of this covariate in the score model, however, means that learners who submit their quizzes too closely in time tend to receive lower scores. Moreover, learners with high quiz performance tend to resubmit quizzes and obtain high scores, although these likelihoods decrease when they already achieve the pass requirement. The estimated effects of user current quiz score on event and score outcomes also reveal an interesting pattern. Learners with high scores on a quiz are less likely to revise and resubmit their solutions, but if they do it, they tend to receive better scores.

134

D. Vu et al. / Social Networks 43 (2015) 121–135

7. Discussion This paper has considered three extensions to the relational event framework in response to the increasing availability of more complex event data from online applications. These extensions include a ﬂexible stratiﬁcation approach for multi-mode and multiplex event structures, a combination of nested case–control sampling and stratiﬁcation for statistical inference on large data sets, and a suite of more advanced network statistics. However, there are still many open research questions in relational event modeling that need to be addressed in future work. The ﬁrst direction is to allow for time-varying network effects, especially for those network processes observed over long time periods. Few studies have explored this modeling direction for network data (Vu et al., 2011) although such models have been considered substantially in other applications of survival and history event analysis (Martinussen and Scheike, 2006). In learning analytics, for example, as students move through different phases of a course and become more familiar with the learning environment, it is expected that their interactions and behaviors will also change: for example, their knowledge sharing collaborations might become stronger resulting in a larger four-cycle closure effect. The second direction is to go beyond the current set of aggregated network statistics and consider more ﬁne-grained ones in combination with statistical regularization techniques (Tibshirani, 1996) to ﬁlter out irrelevant effects. In the regression model for quiz scores, for example, by using the user degree statistic, we assume a homogeneous effect of forum engagement across all discussion threads. However, such assumption cannot capture the reality that different groups of discussion threads are relevant only to different quizzes at different times. Therefore, a more promising approach is to specify a separate indicator for each discussion thread that equals 1 only if a learner has posted on it. In other words, we replace the aggregated user degree statistic with a set of thread-speciﬁc indicators. Due to a large number of indicators, regularization terms such as lasso penalty (Tibshirani, 1996) can be used to introduce sparsiﬁcation, i.e. constraining non-zero network effects to only threads that are relevant to the quiz of interest. This idea can also be applied to other learning outcomes such as dropout events to detect groups of discussion threads that help to retend learners. The ﬁnal direction is about the development of statistical software for REMs. While inference procedures based on partial likelihoods and nested case–control sampling can rely on well-tested R and SAS packages, to handle more complex event structures, a descriptive language to represent relational event streams, deﬁne statistics, and specify models needs to be developed. Such a modeling language is necessary to facilitate the application of REMs to other ﬁelds where network science is providing methods and insights for a new wave of network data generated by computers and electronic sensors. We are exploring the potential of event processing systems and languages (Luckham, 2002) in implementing such a general software for the relational modeling framework. Regarding our case study of learning processes in MOOCs, the modeling approach developed in this paper is a signiﬁcant contribution to social learning analytics. While reliable and valid causal conclusions can only be drawn from experimental data, our approach provides a tool to identify potential network effects and generate candidate hypotheses from massive empirical data sets which can be easily acquired. These ﬁndings of this form will ultimately be vital to effective designs and interventions for learning. To facilitate the knowledge sharing process among learners, for example, the estimated model for post events can be used to rank and recommend relevant discussion threads to users (Vu et al., 2011). This recommendation feature, on one hand, can route important questions to advanced learners and maximize

their contributions. On the other hand, it can help weak learners to navigate quickly to discussion topics that are most relevant for their current needs. This can be one of promising applications of network modeling in social learning analytics where the goals are not only to discover connection patterns among learners but also promote them. Such applications also require the development of more ﬁne-grained clickstream statistics that can capture the important sequences of events (Mannila et al., 1997) and the application of more advanced content analyses of forum textual documents such as latent topic modeling (Blei et al., 2003). Acknowledgments The authors would like to thank members of Learning Analytics Research Group in University of Melbourne for their helpful comments during the early phase of our study. This research is funded by Australian Research Council Discovery Project DP120102902. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.socnet.2015.05. 001 References Aalen, O., Borgan, O., Gjessing, H., 2008. Survival and Event History Analysis: A Process Point of View. Springer. Aalen, O.O., Fosen, J., Weedon-Fekjær, H., Borgan, R., Husebye, E., 2004. Dynamic analysis of multivariate failure time data. Biometrics 60, 764–773. Agresti, A., 2010. Analysis of Ordinal Categorical Data (Wiley Series in Probability and Statistics), second ed. Wiley. Andersen, P., Borgan, O., Gill, R., Keiding, N., 1993. Statistical Models Based on Counting Processes. Springer. Arnold, K.E., Pistilli, M.D.,2012. Course signals at Purdue: using learning analytics to increase student success. In: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. ACM, New York, NY, USA, pp. 267–270. Bakharia, A., Dawson, S.,2011. Snapp: a bird’s-eye view of temporal participant interaction. In: Proceedings of the 1st International Conference on Learning Analytics and Knowledge. ACM, New York, NY, USA, pp. 168–173. Blackmore, C., 2010. Social Learning Systems and Communities of Practice. Springer. Blei, D., Ng, A., Jordan, M., 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022. Borgan, O., Goldstein, L., Langholz, B., 1995. Methods for the analysis of sampled cohort data in the Cox proportional hazards model. Ann. Stat. 23, 1749–1778. Brandes, U., Lerner, J., Snijders, T.A.B.,2009. Networks evolving step by step: statistical analysis of dyadic event data. In: Proceedings of the 2009 International Conference on Advances in Social Network Analysis and Mining. IEEE Computer Society, Washington, DC, USA, pp. 200–205. Butts, C., 2008. A relational event framework for social action. Sociol. Methodol. 38, 155–200. Cox, D.R., 1972. Regression models and life-tables. J. R. Stat. Soc. Ser. B 34, 187–220. Cressie, N.A.C., 1993. Statistics for Spatial Data, revised ed. Wiley. Dawson, S., 2010. ‘Seeing’ the learning community: an exploration of the development of a resource for monitoring online student networking. Br. J. Educ. Technol. 41, 736–752. DuBois, C., Butts, C.T., McFarland, D., Smyth, P., 2013. Hierarchical models for relational event sequences. J. Math. Psychol. 57, 297–309. Frank, O., Strauss, D., 1986. Markov graphs. J. Am. Stat. Assoc. 81, 832–842. Haythornthwaite, C., Andrews, R., 2011. E-learning Theory and Practice. Sage. Haythornthwaite, C., de Laat, M., 2010. Social networks and learning networks: using social network perspectives to understand social learning. In: Proceedings of the 7th International Conference on Networked Learning, Aalborg, Denmark. Hunter, D.R., Goodreau, S.M., Handcock, M.S., 2013. ergm.userterms: a template package for extending statnet. J. Stat. Softw. 52, 1–25. Hunter, D.R., Handcock, M.S., 2006. Inference in curved exponential family models for networks. J. Comput. Graph. Stat. 15, 565–583. Kalbﬂeisch, J.D., Prentice, R.L., 2002. The Statistical Analysis of Failure Time Data (Wiley Series in Probability and Statistics), second ed. Wiley-Interscience. Langholz, B., Borgan, O., 1995. Counter-matching: a stratiﬁed nested case–control sampling method. Biometrika 82, 69–79. Laurillard, D., 2001. Rethinking University Teaching: A Conversational Framework for the Effective Use of Learning Technologies. Routledge. Lomi, A., Mascia, D., Vu, D., Pallotti, F., Conaldi, G., Iwashyna, T.J., 2014. Quality of care and interhospital collaboration: a study of patient transfers in Italy. Med. Care 52, 407–414. Luckham, D., 2002. The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems, ﬁrst ed. Addison-Wesley Professional.

D. Vu et al. / Social Networks 43 (2015) 121–135 Mannila, H., Toivonen, H., Inkeri Verkamo, A., 1997. Discovery of frequent episodes in event sequences. Data Min Knowl. Discov. 1, 259–289, http://dx.doi.org/10. 1023/A:1009748302351 Martinussen, T., Scheike, T., 2006. Dynamic Regression Models for Survival Data. Springer. de Nooy, W., 2011. Networks of action and events over time. A multilevel discretetime event history model for longitudinal network data. Soc. Netw. 33, 31–40. Perry, P.O., Wolfe, P.J., 2013. Point process modelling for directed interaction networks. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 75, 821–849. Quintane, E., Pattison, P.E., Robins, G.L., Mol, J.M., 2013. Short- and long-term stability in organizational networks: temporal structures of project teams. Soc. Netw. 35, 528–540. Robins, G., Pattison, P., Kalish, Y., Lusher, D., 2007. An introduction to exponential random graph (p*) models for social networks. Soc. Netw. 29, 173–191. Salathé, M., Vu, D., Khandelwal, S., Hunter, D., 2013. The dynamics of health behavior sentiments on a large online social network. EPJ Data Sci. 2, 1–12. Schweinberger, M., Snijders, T.A., 2007. Markov models for digraph panel data: Monte Carlo-based derivative estimation. Comput. Stat. Data Anal. 51, 4465–4483. Shum, S.B., Ferguson, R., 2012. Social learning analytics. Educ. Technol. Soc. 15, 3–26. Siemens, G., 2013. Learning analytics: the emergence of a discipline. Am. Behav. Sci., http://dx.doi.org/10.1177/0002764213498851 Siemens, G., Gasevic, D., 2012. Guest editorial – learning and knowledge analytics. Educ. Technol. Soc. 15, 1–2. Snijders, T.A.B., 2001. The statistical evaluation of social network dynamics. Sociol. Methodol. 31, 361–395.

135

Snijders, T.A.B., Bunt, G.G.V.D., Steglich, C.E.G., 2010. Introduction to stochastic actorbased models for network dynamics. Soc. Netw. 32, 44–60. Snijders, T.A.B., Pattison, P.E., Robins, G.L., Handcock, M.S., 2006. New speciﬁcations for exponential random graph models. Sociol. Methodol. 36, 99–153. Tanes, Z., Arnold, K.E., King, A.S., Remnet, M.A., 2011. Using signals for appropriate feedback: perceptions and practices. Comput. Educ. 57, 2414–2422. Tibshirani, R., 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. (Ser. B) 58, 267–288. Vu, D., Zappa, P., Liberati, C., Lomi, A., 2014. Relational event models with timeweighted statistics: an application to counterparty selection in the EU interbank market (in preparation). Vu, D.Q., Asuncion, A.U., Hunter, D.R., Smyth, P., 2011. Continuous-time regression models for longitudinal networks. In: Proceedings of 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), pp. 2492–2500. Vu, D.Q., Asuncion, A.U., Hunter, D.R., Smyth, P., 2011. Dynamic egocentric models for citation networks. In: Proceedings of 28th International Conference on Machine Learning (ICML 2011), pp. 857–864. Wang, P., Pattison, P., Robins, G., 2013. Exponential random graph model speciﬁcations for bipartite networks – a dependence hierarchy. Soc. Netw. 35, 211–222. Wang, P., Robins, G., Pattison, P., Lazega, E., 2013. Exponential random graph models for multilevel networks. Soc. Netw. 35, 96–115. Yule, G.U., 1925. A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S. Philos. Trans. R. Soc. Lond. Ser. B 213, 21–87, Containing Papers of a Biological Character.

Relational event models for social learning in MOOCs

Relational event models for social learning in MOOCs

Recommend Documents