Sharing Corpus Resources in Language Learning

Sharing Corpus Resources in Language Learning

Angela Chambers (University of Limerick, Ireland) and Martin Wynne (University of Oxford, UK)
DOI: 10.4018/978-1-59904-895-6.ch025
OnDemand PDF Download:


Since the early 1990s, researchers have been investigating the effectiveness of corpora as a resource in language learning, mostly creating their own small corpora. As it is neither feasible nor desirable to envisage a future in which all teachers create their own corpora, and as the content of language courses is similar in many universities throughout the world, the sharing of resources is clearly necessary if corpus data are to be made available to language teachers and learners on a large scale. Taking one small corpus as an example, this chapter aims to investigate the issues arising if corpus consultation is to become an integral part of the language-learning environment. The chapter firstly deals with fundamental questions concerning the creation and reusability of corpora, namely planning, construction, documentation, and also legal, moral and technical issues. It then explores the issues arising from the use of a corpus of familiar texts, in this case a French journalistic corpus, with advanced learners. In conclusion we propose a framework for the optimal use of corpora with language learners in the context of higher education.

Key Terms in this Chapter

Annotation: (see Markup)

Collocation: The tendency of certain words to occur more frequently in the vicinity of particular words in texts. For example, ‘rancid’ tends to occur with ‘butter.’

Corpus: A collection of naturally occurring data collected for the purpose of a linguistic investigation. A corpus may include materials representing various modes, registers and text types, and it may be possible to isolate these subsets of data, and analyze them separately or contrast them. Such a subdivision of a corpus is known as a subcorpus. A parallel corpus contains texts and translations of those texts, and is compiled in order to analyze and study translations.

Markup: In the form of tags in a text, is used to add information about the structure of a text and about its linguistic properties. Markup may be used to indicate such structural features as titles and headings, paragraph boundaries, highlighted text, and linguistic features such as lemmas and word classes. Linguistic information which has been added to a corpus in the form of tags is often known as annotation.

Concordance: A list of the occurrences of a word (or other search term), presented one per line along with the immediate surrounding text, in order to display for the analyst a set of examples of the usage of a word, and to enable patterns of usage surrounding the word to be observed. Concordances may be produced by a piece of software known as a concordancer.

Text Encoding: Text may be captured in electronic form in various ways. Electronic texts are stored in the form of binary data, and will make use of some form of mapping from the binary codes to characters in the language. In the past, various competing standards have existed, with different mappings for different languages and on different computer systems. There is now an international standard, Unicode, which aims to represent all characters in all languages, and be usable on all computer systems. Not all corpora use Unicode, and not all software applications currently make use of it, so difficulties may arise when attempting to share language data.

Archive: A repository where materials which are considered to be of potential future value are deposited in a secure environment, where their ongoing viability may be monitored. In the case of electronic resources, such as language corpora, a digital archive is required. Digital archives need to ensure the physical security of the data, which may be on a variety of media such as magnetic tape, removable disks, computer disk drives, and need to provide robust backup and disaster recovery facilities. It is also necessary that the curation of the data involves ensuring that it is stored in formats which are usable with current software.

Metadata: In corpus linguistics, the information about a corpus and about the constituent texts is known as metadata. Metadata will typically include information about when and by whom a corpus was created, the sampling strategy which was applied to compile the corpus, and information about the texts in the corpus, such as title, author and date of publication. Metadata may be in separate documentation files, or may be inserted in the corpus text files in the form of headers.

Complete Chapter List

Search this Book:
List of Reviewers
Table of Contents
Andrew Lian
Felicia Zhang, Beth Barber
Chapter 1
Gabriella Brussino, Cathy Gunn
A theoretically driven and technology enhanced approach to second language acquisition at beginners level is illustrated through the description of... Sample PDF
Australasian Language Learners and Italian Web Sites: A Profitable Learning Partnership?
Chapter 2
Michael Fitze
This chapter reports on a comparative study of face-to-face (FTF) and written electronic (WE) conferences as pre-writing activities in the English... Sample PDF
Assessing the Benefit of Prewriting Conferences on Drafts
Chapter 3
Joel Bloch, Cathryn Crosby
This chapter discusses the use of blogging in a beginning level academic writing course. Blogging was used in this writing course both as a means of... Sample PDF
Blogging and Academic Writing Development
Chapter 4
Robert Ariew, Gulcan Erçetin, Susan Cooledge
This chapter introduces second language reading in hypertext/hypermedia environments. It discusses the development of a template to annotate reading... Sample PDF
Second Language Reading in Hypertext Environments
Chapter 5
Leo Kam-hung Yu
The consciousness-raising approach to grammar teaching aims to provide opportunities for students to identify some grammatical components through... Sample PDF
Application of Online Questionnaires in Grammar Teaching
Chapter 6
Diane Huot, France H. Lemonnier, Josiane Hamers
This chapter presents the key findings of a longitudinal study conducted with secondary school students over a period of five years to determine... Sample PDF
ICT and Language Learning at Secondary School
Chapter 7
David Barr
This chapter reports on the results of a study undertaken to gauge what difference computer technology makes to grammar learning. Unlike other... Sample PDF
Computer-Enhanced Grammar Teaching
Chapter 8
Luba V. Iskold
This study examines the effects of listening tasks performed by second-semester learners of Russian. Two video viewing conditions are investigated... Sample PDF
Research-Based Listening Tasks for Video Comprehension
Chapter 9
Linda Jones
This study addresses the views of 9 students on the amount of invested mental effort (Salomon, 1983a) needed to effectively process multimedia... Sample PDF
Invested Mental Effort in an Aural Multimedia Environment
Chapter 10
Kenneth Reeder, Jon Shapiro, Margaret Early, Maureen Kendrick, Jane Wakefield
This chapter describes the first year of research on the effectiveness of automated speech recognition (ASR) for ESL learners in the early school... Sample PDF
A Computer-Based Reading Tutor for Young Language Learners
Chapter 11
Eva Lindgren, Marie Stevenson, Kirk P.H. Sullivan
In this chapter an instructional format, Peer-Based Intervention (PBI) using computer keystroke logging is investigated as a computer technology to... Sample PDF
Supporting the Reflective Language Learner with Computer Keystroke Logging
Chapter 12
Jörg Roche, Julia Scheller
The present study is situated in the context of cognitive aspects of language processing as it focuses on the learning and teaching of grammar in... Sample PDF
Grammar Animations and Cognition
Chapter 13
Hazel Morton, Nancie Davidson, Mervyn Jack
This chapter describes the design of a speech interactive CALL program and its evaluation with end users. The program, SPELL (Spoken Electronic... Sample PDF
Evaluation of a Speech Interactive CALL System
Chapter 14
Maliwan Buranapatana, Felicia Zhang
This chapter reports on a study which evaluates the effect of a language teaching approach called the Somatically-Enhanced Approach (Zhang, 2006)in... Sample PDF
Pedagogy Meets Technology in the Somatically-Enhanced Approach
Chapter 15
Xinchun Wang
This study explores the effect of two training paradigms for learning Mandarin tones in pedagogical contexts. Eighteen beginning learners of Chinese... Sample PDF
Training for Learning Mandarin Tones
Chapter 16
Nattaya Puakpong
This chapter examines the effect of an individualized Computer-Enhanced Language Learning Listening Comprehension Program (MMExplore) on students’... Sample PDF
An Evaluation of a Listening Comprehension Program
Chapter 17
Terence C. Ahern
Authentic experiences encourage the student to cognitively engage the content by actively trying to make sense and to integrate the experience. This... Sample PDF
CMC for Language Acquisition
Chapter 18
Shannon Johnston
A task-based approach to e-mail provides a sound pedagogical orientation for real language interactions between learners and native speakers. The... Sample PDF
A Task-Based Design for Integrating E-Mail with FL Pedagogy
Chapter 19
Margarita Vinagre, Maria Lera
In this chapter we analyze the role that error correction plays in fostering language development via e-mail tandem exchanges. In order to do so, we... Sample PDF
The Role of Error Correction in Online Exchanges
Chapter 20
Stella K. Hadjistassou
This study reports on a culturally-transforming group activity using asynchronously-mediated forums on the “discussion board” of Blackboard Academic... Sample PDF
Emerging Feedback in Two Asynchronous ESL Writing Forums
Chapter 21
Martina Möllering, Markus Ritter
One key theme in the area of computer-assisted language learning has been the potential of computermediated communication (CMC) for the language... Sample PDF
CMC and Intercultural Learning
Chapter 22
Claudia Finkbeiner, Markus Knierim
Research on CALL environments that explicitly focuses on the development of strategic competence is almost non-existent. This chapter reports on an... Sample PDF
Developing L2 Strategic Competence Online
Chapter 23
Faridah Pawan, Senom T. Yalcin, Xiaojing Kou
This research is an exploratory study of student variables that mediate collaborative engagement in online discussions. More directly, the research... Sample PDF
Interventions and Student Factors in Collaboration
Chapter 24
Rolf Kreyer
Although corpus linguistic methods and research have had a considerable impact on language teaching in the last few decades, the corpus is still... Sample PDF
Corpora in the Classroom and Beyond
Chapter 25
Angela Chambers, Martin Wynne
Since the early 1990s, researchers have been investigating the effectiveness of corpora as a resource in language learning, mostly creating their... Sample PDF
Sharing Corpus Resources in Language Learning
Chapter 26
Terence Patrick Murphy
This chapter addresses the question of how to measure the student’s English as a second language (ESL) textual sophistication. It suggests that the... Sample PDF
The Texture of Inefficiently Self-Regulating ESL Systems
Chapter 27
Hayo Reinders, Noemí Lázaro
This chapter discusses the results of a study into the use of technology in the specific pedagogical setting of self-access centers. As part of the... Sample PDF
Technology in Support of Self-Access Pedagogy
Chapter 28
Stephen Alan Shucart, Tsutomu Mishina, Mamoru Takahashi, Tetsuya Enokizono
Unlike most CALL labs that are purchased from a vender and employ either generic or commercial CALL software and technologically untrained teachers... Sample PDF
The CALL Lab as a Facilitator for Autonomous Learning
Chapter 29
Junichi Azuma
This chapter describes how the synthesized English speech sound generated by a commercial TTS engine (Pentax “VoiceText”) is utilized within a CALL... Sample PDF
Applying TTS Technology to Foreign Language Teaching
Chapter 30
Yuko Kinoshita
This chapter presents a descriptive report on a video chat project undertaken in early 2006 at the University of Canberra, Australia using iChat.... Sample PDF
Using an Audio-Video Chat Program in Language Learning
About the Contributors