Engagement Modeling

Current approaches to learning analytics are focused mainly on providing feedback to learners about their knowledge states, based on their responses to assessment questions. Accounting for additional cognitive factors (most importantly, learner engagement) has the potential to yield more effective learning analytics and feedback. However, measuring these factors has remained a difficult task.  Recently, the emergence of online learning platforms, such as massive online open courses (MOOCs), have the capability to collect behavioral data that can provide some indicators of them.

Existing approaches to measure learner engagement can be roughly divided into two categories: device-based and activity-based.  Device-based approaches measure learner engagement using devices external to the learning platform, such as cameras to record facial expressions, eye-tracking devices to detect mind wandering while reading text documents. These approaches require a learning platform to integrate external devices and are invasive.  Activity-based approaches, on the other hand, measure engagement using heuristic features constructed from learners' activity logs. These approaches employ heuristic definitions of engagement that are not guaranteed to correlate with learner performance.

In this work, we propose a probabilistic model for inferring a learner's engagement level by treating it as a latent variable that drives the learner's performance and is in turn driven by the learner's behavior. We apply our framework to a dataset collected from the fall 2012 offering of the Princeton University course Networks: Friends, Money, and Bytes on Coursera. The dataset consists of clickstream actions generated as learners watch lecture videos, and responses from learners answering in-video quiz questions. Our measure of learner engagement i) is solely given by clickstream data, which can be collected from entirely within online learning platforms, and ii) directly correlates with performance.

We first summarize the raw learner clickstream actions into 9 features:
the fraction of time the learner spent on the video (relative to its length),
the fraction of the video that the learner completed,
the fraction of the video that the learner played,
the fraction of time the learner stayed paused on the video,
the number of times the learner paused the video,
the number of times the learner skipped backwards,
the number of times the learner skipped forward,
the time-average of the learner's playback rate throughout the video, and
the standard deviation of the learner's playback rate.

Using these behavioral features, we then propose two probabilistic models, including a response model and a learning model. The response model relates learners' responses to quiz questions to their latent concept knowledge states. The learning model characterizes the increase in learners' knowledge induced by watching videos; the learning gain is proportional to our notion of engagement, which is modeled as a function of the behavioral features. Using cross-validation, we found that our model achieves better performance in predicting unobserved learner responses to quiz questions over two existing state-of-the-art models: the sparse factor analysis (SPARFA) model and an advanced version of the Bayesian knowledge tracing (BKT) model.

Moreover, we analyze the correlation between each behavioral feature and engagement.
All of the features except for the number of fast forwards are positively correlated with the latent engagement level.
The features that contribute most to high latent engagement levels are the number of pauses, the number of rewinds, and the average playback rate.

Furthermore, we also visualize the the evolution of learner engagement over time. We break down the learners into three different types according to their engagement patterns and plot their engagement level over time.
The first type of learner finishes the course and consistently exhibits high engagement levels throughout the duration.
The second type also consistently exhibits high engagement levels, but drops out of the course after up to three weeks.
The third type of learner exhibits inconsistent engagement levels before an early dropout.


A. S. Lan, C. G. Brinton, T. Yang, and M. Chiang, "Behavior-Based Latent Variable Model for Learner Engagement,"Proc. International Conference on Educational Data Mining (EDM), 2017, to appear.