5 Ways to Open Up Corpora for Language Learning

5 Ways to Open Up Corpora for Language Learning

Corpora developed by linguists to study languages are a promising source of authentic materials to employ in the development of OER for language learning. Recently, COERLL’s SpinTX Corpus-to-Classroom project launched a new open resource that seeks to make it easy to search and adapt materials from a video corpus.

The SpinTX video archive  provides a pedagogically-friendly web interface to search hundreds of videos from the Spanish in Texas Corpus. Each of the videos is accompanied by synchronized closed captions and a transcript that has been annotated with thematic, grammatical, functional and metalinguistic information. Educators using the site can also tag videos for features that match their interests, and share favorite videos in playlists.

A collaboration among educators, professional linguists, and technologists, the SpinTX project leverages different aspects of the “openness” movement including open research, open data, open source software, and open education. It is our hope that by opening up this corpus, and by sharing the strategies and tools we used to develop it, others may be able to replicate and build on our work in other contexts.

So, how do we make a corpus open and beneficial across communities? Here are 5 ways:

1. Create an open and accessible search interface

Minimize barriers to your content. Searching the SpinTX video archive requires no registration, passwords or fees. To maximize accessibility, think about your audience’s context and needs. The SpinTX video archive offers a corpus interface specifically for educators, and plans to to create a different interface for researchers.

2. Use open content licences

Add a Creative Commons license to your corpus materials. The SpinTX video archive uses a CC BY-NC-SA license that requires attribution but allows others to reuse the materials different contexts.

3. Make your data open and share content

Allow others to easily embed or download your content and data. The SpinTX video archive provides social sharing buttons for each video, as well as providing access to the source data (tagged transcripts) through Google Fusion Tables.

4. Embrace open source development

When possible, use and build upon open source tools. The SpinTX project was developed using a combination of open source software (e.g. TreeTagger, Drupal) and open APIs (e.g. YouTube Captioning API). Custom code developed for the project is openly shared through a GitHub repository.

5. Make project documentation open

Make it easy for others to replicate and build on your work. The SpinTX team is publishing its research protocols, development processes and methodologies, and other project documentation on the SpinTX Corpus-to-Classroom blog.

Openly sharing language corpora may have wide-ranging benefits for diverse communities of researchers, educators, language learners, and the public interest. The SpinTX team is interested in starting a conversation across these communities. Have you ever used a corpus before? What did you use it for? If you have never used a corpus, how do you find and use authentic videos in the classroom?  How can we make video corpora more accessible and useful for teachers and learners?

gilgRachael Gilg is the Project Manager and Lead Developer for COERLL’s Spanish in Texas Corpus project and the SpinTX Corpus-to-Classroom project. She has acted as project manager, designer, and developer on a diverse set of projects, including educational websites and online courses, video and interactive media, digital archives, and social/community websites.

Best of MERLOT: Award-Winning World Language Resources

Best of MERLOT: Award-Winning World Language Resources

In my last post, I blogged about the de rigueur French sites I share with my community college students through the Multimedia Educational Resource for Learning and Online Teaching (MERLOT). In addition to these, I must mention that there are almost 2,500 World Languages materials in MERLOT, not just in French, but in Arabic, Chinese, ESL, German, Hindi, Italian, Japanese, Latin, Portuguese, Spanish and many other languages. There are simulations, animations, blogs, word clouds, virtual art galleries and recording studios, tutorials, videos, webquests and worksheets. The cost is just a bit of your time.

One of the most effective ways to find the best of MERLOT is by exploring the recipients of our World Languages Editor’s Choice and MERLOT Classics Awards. The Classic Award winners are chosen among outstanding online resources designed to enhance teaching and learning. The Editor’s Choice Award is an honor bestowed on one excellent learning material among all the Classics Award winners. An easy way to peruse all the award-winning resources is to visit the About MERLOT Awards/Exemplary Materials page,  

Top 3 Editor’s Choice Recipients
  1. LangMedia consists of a collection of target language videos done by international students from the Five Colleges of Massachusetts in their home countries. Videos in languages from Arabic to Wolof are included with transcripts, images and realia. See videos of French as it is spoken in a variety of Francophone nations, Spanish in the Spanish-speaking world, etc. There is also a substantial Bangla/Bengali collection, Czech, Croatian and on through the alphabet of languages. In addition to the language videos, there are also CultureTalk series  that are coded for elementary, middle school and high school classes. These resources can enhance language courses anywhere or be used by prospective travelers to the regions.
  2. Ojalá que llueva café  is a timeless favorite of Spanish teachers and learners everywhere for its embedding of culture, grammar and structure. Completely in the target language, it not only contains a glossed reading of the popular song by Juan Luis Guerra, it features a beautiful photo gallery of the Dominican Republic and many exercises to teach the subjunctive in an engaging way. Author Barbara K. Nelson, went on to create many modules using a similar format in her five-star Spanish.language&culture site.
  3. Lingu@net Worldwide  (formerly Lingu@netEuropa) catalogues some 3,500 learning materials all geared toward learning languages. Linguanet Worldwide allows users to discern their learning styles, to find conversation partners and to locate resources to enhance their knowledge of the target language and culture. The resources it points to reach a wide and diverse potential audience: casual learners of languages in a variety of age groups, students of languages for professional or academic reasons and others.

I hope this tour of the best of MERLOT inspires Open Up readers to submit their own work to MERLOT World Languages and to comment upon what they find in our collections. For instance, what features do you want to see that are not already in MERLOT now?

LauraLaura Franklin teaches French online at the Extended Learning Institute, Northern Virginia Community College. She is one of the original Co-Editors of MERLOT World Languages. For information on becoming a MERLOT World Languages Peer Reviewer, contact Laura at lfranklin@nvcc.edu.

To find more OER for languages, see Open Up on Open Education Week.