Sharing Corpus Resources in Language Learning

Angela Chambers; Martin Wynne

doi:10.4018/978-1-59904-895-6.ch025

Special Offers
- IGI Global’s New Emerging Topic e-Book Collections
  Acquire highly focused and affordable Cutting-Edge Peer-Reviewed Research Content through a selection of 17 topic-focused e-Book Collections discounted up to 90%, compared to list prices. Collection topics include Artificial Intelligence, Data Science, Language Learning, Marketing and Customer Relations, Sustainability, and many more. Hosted on the InfoSci^® platform, these collections feature no DRM, no additional cost for multi-user licensing, no embargo of content, full-text PDF & HTML format, and more.
  Learn More
- Open Access Book (Free Access) - Encyclopedia of Information Science and Technology, Sixth Edition (ISBN: 9781668473665)
  The Encyclopedia of Information Science and Technology, Sixth Edition) continues the legacy set forth by the first five editions by providing comprehensive coverage and up-to-date definitions of the most important issues, concepts, and trends pertaining to technological advancements and information management within a variety of settings and industries. The entire book is being published under open access.
  Read Now
- Open Access Book (Free Access) - Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries (ISBN: 9781668456293)
  Food Sustainability, Environmental Awareness, and Adaptation and Mitigation Strategies for Developing Countries provides information on the recent technology, mitigation, and environmental protection that must be applied for food sustainability in developing countries. This book is being published under Platinum Open Access through funding from Diponegoro University, Indonesia.
  Read Now
- Open Access Book (Free Access) - New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY (ISBN: 9781668438091)
  The Walmart Corporation and the Lumina Foundation have provided funding to make New Models of Higher Education: Unbundled, Rebundled, Customized, and DIY fully open access, completely removing any paywall between scholars in education and the latest research on new models for the future of higher education.
  Read Now
- Open Access Book (Free Access) - Handbook of Research on the Global View of Open Access and Scholarly Communications (ISBN: 9781799898054)
  Through a collaboration between IGI Global and the University of North Texas, the Handbook of Research on the Global View of Open Access and Scholarly Communications has been published as fully open access, completely removing any paywall between researchers of any field, and the latest research on the equitable and inclusive nature of Open Access and all of its complications.
  Read Now
Books
- - Books by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Books by Field
Journals
- - Journals
  - OnDemand Journal Articles
  - Journals by Subject
  - Business, Administration, & Management
  - Scientific, Technical, & Medical (STM)
  - Education
  - Journals by Field
e-Collections
Open Access
- View All Open Access Opportunities
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Find an Open Access Journal for Your Next Manuscript
  Search across all of IGI Global’s available open access publishing opportunities to unleash your research potential.
  Submit an Open Access Book Proposal
  Learn more about open access book publishing and how it can propel your research forward in the field.
  Convert Your Work to Open Access
  Already published? You can convert your work to open access to increase its impact through IGI Global’s Restrospective Open Access Program.
  Utilize Open Access Collection Database
  Open up your research potential by utilizing our open access content or integrating the open access collection into your library
  Consider Open Access Agreements
  For Libraries: consider no-cost or investment-level open access agreements with IGI Global to support your faculty's research endeavors.
  Search Funding Resources
  Looking for additional funding resources to support your open accesss endeavors? View industry resources compiled by our open access team.
  Review Open Access Policies & Ethical Guidelines
  Considering IGI Global to publish your work under open access? Review IGI Global’s open access policies and ethical guidelines
Publish with Us
Resources
- - Instructors
  - Course Adoption
  - Teaching Cases
  - K-12 Online Learning Collection
  - Authors and Editors
  - eEditorial Discovery^® System
  - Peer Review Process
  - Ethics and Malpractice
  - COPE Membership
  - Fair Use Policy
  - Open Access Publishing
  - FAQ
Catalogs
About Us
Newsroom

Sharing Corpus Resources in Language Learning

Angela Chambers, Martin Wynne

Source Title: Handbook of Research on Computer-Enhanced Language Acquisition and Learning

DOI: 10.4018/978-1-59904-895-6.ch025

OnDemand:

(Individual Chapters)

Available

$37.50

Current Special Offers

No Current Special Offers

Abstract

Since the early 1990s, researchers have been investigating the effectiveness of corpora as a resource in language learning, mostly creating their own small corpora. As it is neither feasible nor desirable to envisage a future in which all teachers create their own corpora, and as the content of language courses is similar in many universities throughout the world, the sharing of resources is clearly necessary if corpus data are to be made available to language teachers and learners on a large scale. Taking one small corpus as an example, this chapter aims to investigate the issues arising if corpus consultation is to become an integral part of the language-learning environment. The chapter firstly deals with fundamental questions concerning the creation and reusability of corpora, namely planning, construction, documentation, and also legal, moral and technical issues. It then explores the issues arising from the use of a corpus of familiar texts, in this case a French journalistic corpus, with advanced learners. In conclusion we propose a framework for the optimal use of corpora with language learners in the context of higher education.

Key Terms in this Chapter

Annotation: (see Markup)

Collocation: The tendency of certain words to occur more frequently in the vicinity of particular words in texts. For example, ‘rancid’ tends to occur with ‘butter.’

Corpus: A collection of naturally occurring data collected for the purpose of a linguistic investigation. A corpus may include materials representing various modes, registers and text types, and it may be possible to isolate these subsets of data, and analyze them separately or contrast them. Such a subdivision of a corpus is known as a subcorpus. A parallel corpus contains texts and translations of those texts, and is compiled in order to analyze and study translations.

Markup: In the form of tags in a text, is used to add information about the structure of a text and about its linguistic properties. Markup may be used to indicate such structural features as titles and headings, paragraph boundaries, highlighted text, and linguistic features such as lemmas and word classes. Linguistic information which has been added to a corpus in the form of tags is often known as annotation.

Concordance: A list of the occurrences of a word (or other search term), presented one per line along with the immediate surrounding text, in order to display for the analyst a set of examples of the usage of a word, and to enable patterns of usage surrounding the word to be observed. Concordances may be produced by a piece of software known as a concordancer.

Text Encoding: Text may be captured in electronic form in various ways. Electronic texts are stored in the form of binary data, and will make use of some form of mapping from the binary codes to characters in the language. In the past, various competing standards have existed, with different mappings for different languages and on different computer systems. There is now an international standard, Unicode, which aims to represent all characters in all languages, and be usable on all computer systems. Not all corpora use Unicode, and not all software applications currently make use of it, so difficulties may arise when attempting to share language data.

Archive: A repository where materials which are considered to be of potential future value are deposited in a secure environment, where their ongoing viability may be monitored. In the case of electronic resources, such as language corpora, a digital archive is required. Digital archives need to ensure the physical security of the data, which may be on a variety of media such as magnetic tape, removable disks, computer disk drives, and need to provide robust backup and disaster recovery facilities. It is also necessary that the curation of the data involves ensuring that it is stored in formats which are usable with current software.

Metadata: In corpus linguistics, the information about a corpus and about the constituent texts is known as metadata. Metadata will typically include information about when and by whom a corpus was created, the sampling strategy which was applied to compile the corpus, and information about the texts in the corpus, such as title, author and date of publication. Metadata may be in separate documentation files, or may be inserted in the corpus text files in the form of headers.

Complete Chapter List

Search this Book:

Reset

MLA

APA

Chicago

Export Reference

Sharing Corpus Resources in Language Learning

Abstract

Key Terms in this Chapter

Complete Chapter List