Open Content Means Open Data

When we talk about the importance of open content, there are a few clear advantages that are consistently mentioned including access, cost, and the ability to remix. Often neglected from the discussion is the data created and collected by learners when using online resources. Much in the same way Facebook, Google and Amazon have created business models by providing online resources and then monetizing the data, we should be aware the same model exists in education as well.

This isn’t to say that all open content creators are ignorant of the importance of their data. EdX has made the improvement of online education a central part of their mission. However, we should all take this a step further. First we should very publicly guarantee the privacy of all data created by learners using our projects. Anonymized data will only be given to researchers in accordance with their institution’s research review process and will not be sold under any circumstance. Second, we should be open about the data we are collecting and encourage researchers in the field to make use of our datasets.

For The Mixxer, a social networking site for language learners seeking to language exchanges via Skype, this means providing a clear (and extremely short) privacy policy. I also include an invitation to researchers on the About page and will present the type of data available at IMFLIT, a conference on Computer Mediated Communication (CMC), tandem learning, intercultural communication, and foreign language learning.

Compared to many other open education resources, The Mixxer is rather small with between 30 and 40,000 active users per month. However, as a social networking site, I do collect significant data on each user to help them find potential language partners, including their native language(s), language(s) they are studying, and optionally their age and country of residence. Connected to this data is site activity including frequency of visits to the site, number of friend requests, and any writing each user has submitted along with corrections they have received or provided. This data can also be used to send targeted surveys to ask users about their language learning. To get a better idea of the type of data that can be collected, see my paper on FLTMag.

I should also mention the kind of data that I cannot or will not provide. For most users, the exchanges themselves happen separately from site via Skype. While they can message each other on the site, I am not willing to provide the texts of these messages for privacy reasons, and they would not provide examples of negotiation of meaning seen in many research studies. I also do not have any reliable information on the level of proficiency of users in their target language. Potential surveys could ask about level of proficiency, but researchers would either need to rely on self-assessment of users or provide a means of assessment.

Anyone interested in potentially using datasets from the Mixxer website or with questions about using the site as part of the course, please feel free to contact me. I can be reached on Twitter @bryantt.

To learn more about the role of student data in education technology:

Todd Bryant is the liaison to the foreign language departments for the Academic Technology group at Dickinson College and an adjunct instructor of German. Todd created The Mixxer to help connect language students with native speakers. His interests include the immersive effect of games in service of foreign language learning, such as the use of World of Warcraft to teach German.

