History, Features, and Typology of Language Corpora
Title | History, Features, and Typology of Language Corpora PDF eBook |
Author | Niladri Sekhar Dash |
Publisher | Springer |
Pages | 311 |
Release | 2018-02-01 |
Genre | Language Arts & Disciplines |
ISBN | 9811074585 |
This book discusses key issues of corpus linguistics like the definition of the corpus, primary features of a corpus, and utilization and limitations of corpora. It presents a unique classification scheme of language corpora to show how they can be studied from the perspective of genre, nature, text type, purpose, and application. A reference to parallel translation corpus is mandatory in the discussion of corpus generation, which the authors thoroughly address here, with a focus on Indian language corpora and English. Web-text corpus, a new development in corpus linguistics, is also discussed with elaborate reference to Indian web text corpora. The book also presents a short history of corpus generation and provides scenarios before and after the advent of computer-generated digital corpora. This book has several important features: it discusses many technical issues of the field in a lucid manner; contains extensive new diagrams and charts for easy comprehension; and presents discussions in simplified English to cater to the needs of non-native English readers. This is an important resource authored by academics who have many years of experience teaching and researching corpus linguistics. Its focus on Indian languages and on English corpora makes it applicable to students of graduate and postgraduate courses in applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language.
History, Features, and Typology of Language Corpora
Title | History, Features, and Typology of Language Corpora PDF eBook |
Author | Robert Harrell |
Publisher | Createspace Independent Publishing Platform |
Pages | 134 |
Release | 2017-09-12 |
Genre | |
ISBN | 9781984173119 |
This book has several important features: it discusses many technical issues of the field in a lucid manner; contains extensive new diagrams and charts for easy comprehension; and presents discussions in simplified English to cater to the needs of non-native English readers. This is an important resource authored by academics who have many years of experience teaching and researching corpus linguistics. Its focus on Indian languages and on English corpora makes it applicable to students of graduate and postgraduate courses in applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language. This book discusses key issues of corpus linguistics like the definition of the corpus, primary features of a corpus, and utilization and limitations of corpora. It presents a unique classification scheme of language corpora to show how they can be studied from the perspective of genre, nature, text type, purpose, and application.
New Methods in Historical Corpora
Title | New Methods in Historical Corpora PDF eBook |
Author | Paul Durrell, Martin Scheible, Silke Whitt, Richard J. Bennett |
Publisher | BoD – Books on Demand |
Pages | 286 |
Release | 2013-09-22 |
Genre | Language Arts & Disciplines |
ISBN | 3823367609 |
Investigating the history of a language depends on fragmentary sources, but electronic corpora offer the possibility of alleviating the problem of 'bad data'. But they cannot overcome it totally, and questions arise of the optimal architecture for a corpus and its representativeness of actual language use, and how a historical corpus can best be annotated to maximize its usefulness. Immense strides have been made in recent years in addressing these questions, with exciting new methods and technological advances. The papers in this volume, which were presented at a conference on New Methods in Historical Corpora (Manchester 2011), exemplify the wide range of these recent developments.
Developing Linguistic Corpora
Title | Developing Linguistic Corpora PDF eBook |
Author | Martin Wynne |
Publisher | Oxbow Books Limited |
Pages | 100 |
Release | 2005 |
Genre | Language Arts & Disciplines |
ISBN |
A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.
Text Types and Corpora
Title | Text Types and Corpora PDF eBook |
Author | Andreas Fischer |
Publisher | Gunter Narr Verlag |
Pages | 252 |
Release | 2002 |
Genre | Computational linguistics |
ISBN | 9783823358800 |
Cross-Linguistic Corpora for the Study of Translations
Title | Cross-Linguistic Corpora for the Study of Translations PDF eBook |
Author | Silvia Hansen-Schirra |
Publisher | Walter de Gruyter |
Pages | 320 |
Release | 2012-12-06 |
Genre | Language Arts & Disciplines |
ISBN | 3110260328 |
The book specifies a corpus architecture, including annotation and querying techniques, and its implementation. The corpus architecture is developed for empirical studies of translations, and beyond those for the study of texts which are inter-lingually comparable, particularly texts of similar registers. The compiled corpus, CroCo, is a resource for research and is, with some copyright restrictions, accessible to other research projects. Most of the research was undertaken as part of a DFG-Project into linguistic properties of translations. Fundamentally, this research project was a corpus-based investigation into the language pair English-German. The long-term goal is a contribution to the study of translation as a contact variety, and beyond this to language comparison and language contact more generally with the language pair English - German as our object languages. This goal implies a thorough interest in possible specific properties of translations, and beyond this in an empirical translation theory. The methodology developed is not restricted to the traditional exclusively system-based comparison of earlier days, where real-text excerpts or constructed examples are used as mere illustrations of assumptions and claims, but instead implements an empirical research strategy involving structured data (the sub-corpora and their relationships to each other, annotated and aligned on various theoretically motivated levels of representation), the formation of hypotheses and their operationalizations, statistics on the data, critical examinations of their significance, and interpretation against the background of system-based comparisons and other independent sources of explanation for the phenomena observed. Further applications of the resource developed in computational linguistics are outlined and evaluated.
Using Corpora to Explore Linguistic Variation
Title | Using Corpora to Explore Linguistic Variation PDF eBook |
Author | Randi Reppen |
Publisher | John Benjamins Publishing |
Pages | 294 |
Release | 2002-01-01 |
Genre | Language Arts & Disciplines |
ISBN | 9789027222794 |
Many large-scale investigations of linguistic variation are unfeasible using traditional approaches. This volume is a collection of papers that illustrate the ways in which linguistic variation can be explored through corpus-based investigation.