Building and Exploring Web Corpora (WAC3 - 2007)
Title | Building and Exploring Web Corpora (WAC3 - 2007) PDF eBook |
Author | Cédrick Fairon |
Publisher | Presses univ. de Louvain |
Pages | 186 |
Release | 2007 |
Genre | Language Arts & Disciplines |
ISBN | 9782874630828 |
WAC More and more people are using Web data for linguistic and NLP research. The Web as Corpusworkshop (WAC) provides a venue for exploring how we can use it effectively and the advancementsto which this could lead.This book is a collection of the talks presented at the 3 rd WAC in Louvain-la-Neuve (Belgium).The focus is on the description of Web corpus collection projects, the exploration of Web datacharacteristics from a linguistics/NLP perspective, and on the use of crawled Web data for NLPpurposes. CLEANEVAL Any use of Web data requires that it be cleaned in order to get rid of unwanted material including,for example, HTML markup, navigation bars, advertisements. To date there has been no sharingof resources or expertise in this particular domain and the cleaning has often been done minimally.Cleaneval was an exercise aimed at promoting collaboration and improving our understandingof the issues. Results and perspectives are presented in this book.
Web As Corpus
Title | Web As Corpus PDF eBook |
Author | Maristella Gatto |
Publisher | A&C Black |
Pages | 258 |
Release | 2014-02-13 |
Genre | Language Arts & Disciplines |
ISBN | 1472571533 |
Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions. The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static information resources to user-generated Web 2.0 content, the breadth and depth of information available is breathtaking – and bewildering. This book explores the theory and practice of the “web as corpus”. It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.
Information Science and Applications
Title | Information Science and Applications PDF eBook |
Author | Kuinam J. Kim |
Publisher | Springer |
Pages | 1087 |
Release | 2015-02-17 |
Genre | Technology & Engineering |
ISBN | 3662465787 |
This proceedings volume provides a snapshot of the latest issues encountered in technical convergence and convergences of security technology. It explores how information science is core to most current research, industrial and commercial activities and consists of contributions covering topics including Ubiquitous Computing, Networks and Information Systems, Multimedia and Visualization, Middleware and Operating Systems, Security and Privacy, Data Mining and Artificial Intelligence, Software Engineering, and Web Technology. The proceedings introduce the most recent information technology and ideas, applications and problems related to technology convergence, illustrated through case studies, and reviews converging existing security techniques. Through this volume, readers will gain an understanding of the current state-of-the-art in information strategies and technologies of convergence security. The intended readership are researchers in academia, industry, and other research institutes focusing on information science and technology.
The Routledge Handbook of Vocabulary Studies
Title | The Routledge Handbook of Vocabulary Studies PDF eBook |
Author | Stuart Webb |
Publisher | Routledge |
Pages | 624 |
Release | 2019-07-30 |
Genre | Language Arts & Disciplines |
ISBN | 1000012387 |
The Routledge Handbook of Vocabulary Studies provides a cutting-edge survey of current scholarship in this area. Divided into four sections, which cover understanding vocabulary; approaches to teaching and learning vocabulary; measuring knowledge of vocabulary; and key issues in teaching, researching, and measuring vocabulary, this Handbook: • brings together a wide range of approaches to learning words to provide clarity on how best vocabulary might be taught and learned; • provides a comprehensive discussion of the key issues and challenges in vocabulary studies, with research taken from the past 40 years; • includes chapters on both formulaic language as well as single-word items; • features original contributions from a range of internationally renowned scholars as well as academics at the forefront of innovative research. The Routledge Handbook of Vocabulary Studies is an essential text for those interested in teaching, learning, and researching vocabulary.
Web Corpus Construction
Title | Web Corpus Construction PDF eBook |
Author | Roland Schäfer |
Publisher | Morgan & Claypool Publishers |
Pages | 197 |
Release | 2013-07-01 |
Genre | Computers |
ISBN | 1627053123 |
The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).
Using Corpora in Contrastive and Translation Studies
Title | Using Corpora in Contrastive and Translation Studies PDF eBook |
Author | Richard Xiao |
Publisher | Cambridge Scholars Publishing |
Pages | 550 |
Release | 2020-06-12 |
Genre | Language Arts & Disciplines |
ISBN | 1527554848 |
The corpus-based approach has developed into a well established paradigm in translation studies and has been recognised as a principal reason for the revival of contrastive linguistics since the 1990s, while corpus-based contrastive and translation studies have in turn significantly expanded the scope of corpus linguistics. This book features a selection of twenty-three papers from the 2008 meeting of Using Corpora in Contrastive and Translation Studies (UCCTS), an international conference series launched to provide an international forum for the exploration of theoretical and practical issues pertaining to the creation and use of corpora in contrastive and translation studies. The papers in this collection represent the latest developments in corpus-based translation studies, corpus-based contrastive studies, parallel corpus development and bilingual lexicography. They are useful resources for researchers as well as postgraduates and their supervisors in translation studies, comparative and contrastive linguistics, corpus linguistics, and computational linguistics.
Forms of Migration, Migrations of Forms: Language studies
Title | Forms of Migration, Migrations of Forms: Language studies PDF eBook |
Author | Associazione italiana di anglistica. Congresso |
Publisher | |
Pages | 574 |
Release | 2009 |
Genre | Language Arts & Disciplines |
ISBN |