Web Corpus Construction

Web Corpus Construction
Title Web Corpus Construction PDF eBook
Author Roland Schäfer
Publisher Morgan & Claypool Publishers
Pages 197
Release 2013-07-01
Genre Computers
ISBN 1627053123

Download Web Corpus Construction Book in PDF, Epub and Kindle

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).

Human Language Technologies – The Baltic Perspective

Human Language Technologies – The Baltic Perspective
Title Human Language Technologies – The Baltic Perspective PDF eBook
Author K. Muischnek
Publisher IOS Press
Pages 208
Release 2018-09-28
Genre Computers
ISBN 1614999120

Download Human Language Technologies – The Baltic Perspective Book in PDF, Epub and Kindle

Computational linguistics, speech processing, natural language processing and language technologies in general have all become increasingly important in an era of all-pervading technological development. This book, Human Language Technologies – The Baltic Perspective, presents the proceedings of the 8th International Baltic Human Language Technologies Conference (Baltic HLT 2018), held in Tartu, Estonia, on 27-29 September 2018. The main aim of Baltic HLT is to provide a forum for sharing new ideas and recent advances in computational linguistics and related disciplines, and to promote cooperation between the research communities of the Baltic States and beyond. The 24 articles in this volume cover a wide range of subjects, including machine translation, automatic morphology, text classification, various language resources, and NLP pipelines, as well as speech technology; the latter being the most popular topic with 8 papers. Delivering an overview of the state-of-the-art language technologies from a Baltic perspective, the book will be of interest to all those whose work involves language processing in whatever form.

Linguistic Structure Prediction

Linguistic Structure Prediction
Title Linguistic Structure Prediction PDF eBook
Author Noah A. Smith
Publisher Springer Nature
Pages 248
Release 2022-05-31
Genre Computers
ISBN 3031021436

Download Linguistic Structure Prediction Book in PDF, Epub and Kindle

A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. We also survey natural language processing problems to which these methods are being applied, and we address related topics in probabilistic inference, optimization, and experimental methodology. Table of Contents: Representations and Linguistic Data / Decoding: Making Predictions / Learning Structure from Annotated Data / Learning Structure from Incomplete Data / Beyond Decoding: Inference

Human Language Technologies

Human Language Technologies
Title Human Language Technologies PDF eBook
Author Inguna Skadina
Publisher IOS Press
Pages 264
Release 2010
Genre Computers
ISBN 1607506408

Download Human Language Technologies Book in PDF, Epub and Kindle

This book contains papers from the Fourth International Conference on Human Language Technologies - the Baltic Perspective (Baltic HLT 2010), held in Riga in October 2010. This conference is the latest in a series which provides a forum for sharing recent advances in human language processing, and promotes cooperation between the computer science and linguistics communities of the Baltic countries and the rest of the world. Bringing together scientists, developers, providers and users, the conference is an opportunity to exchange information, discuss problems, find new synergies, and promote i.

Introduction to Arabic Natural Language Processing

Introduction to Arabic Natural Language Processing
Title Introduction to Arabic Natural Language Processing PDF eBook
Author Nizar Y. Habash
Publisher Springer Nature
Pages 170
Release 2022-06-01
Genre Computers
ISBN 3031021398

Download Introduction to Arabic Natural Language Processing Book in PDF, Epub and Kindle

This book provides system developers and researchers in natural language processing and computational linguistics with the necessary background information for working with the Arabic language. The goal is to introduce Arabic linguistic phenomena and review the state-of-the-art in Arabic processing. The book discusses Arabic script, phonology, orthography, morphology, syntax and semantics, with a final chapter on machine translation issues. The chapter sizes correspond more or less to what is linguistically distinctive about Arabic, with morphology getting the lion's share, followed by Arabic script. No previous knowledge of Arabic is needed. This book is designed for computer scientists and linguists alike. The focus of the book is on Modern Standard Arabic; however, notes on practical issues related to Arabic dialects and languages written in the Arabic script are presented in different chapters. Table of Contents: What is "Arabic"? / Arabic Script / Arabic Phonology and Orthography / Arabic Morphology / Computational Morphology Tasks / Arabic Syntax / A Note on Arabic Semantics / A Note on Arabic and Machine Translation

Natural Language Processing for Historical Texts

Natural Language Processing for Historical Texts
Title Natural Language Processing for Historical Texts PDF eBook
Author Michael Piotrowski
Publisher Morgan & Claypool Publishers
Pages 160
Release 2012
Genre Computers
ISBN 1608459462

Download Natural Language Processing for Historical Texts Book in PDF, Epub and Kindle

Provides an introduction to natural language processing (NLP) for historical texts and an overview of the state of the art in this field. The book offers overview of methods for the acquisition of historical texts, discusses specific methods, and analyses the relationship between NLP and the digital humanities.

Ontology-Based Interpretation of Natural Language

Ontology-Based Interpretation of Natural Language
Title Ontology-Based Interpretation of Natural Language PDF eBook
Author Philipp Cimiano
Publisher Springer Nature
Pages 158
Release 2022-06-01
Genre Computers
ISBN 3031021541

Download Ontology-Based Interpretation of Natural Language Book in PDF, Epub and Kindle

For humans, understanding a natural language sentence or discourse is so effortless that we hardly ever think about it. For machines, however, the task of interpreting natural language, especially grasping meaning beyond the literal content, has proven extremely difficult and requires a large amount of background knowledge. This book focuses on the interpretation of natural language with respect to specific domain knowledge captured in ontologies. The main contribution is an approach that puts ontologies at the center of the interpretation process. This means that ontologies not only provide a formalization of domain knowledge necessary for interpretation but also support and guide the construction of meaning representations. We start with an introduction to ontologies and demonstrate how linguistic information can be attached to them by means of the ontology lexicon model lemon. These lexica then serve as basis for the automatic generation of grammars, which we use to compositionally construct meaning representations that conform with the vocabulary of an underlying ontology. As a result, the level of representational granularity is not driven by language but by the semantic distinctions made in the underlying ontology and thus by distinctions that are relevant in the context of a particular domain. We highlight some of the challenges involved in the construction of ontology-based meaning representations, and show how ontologies can be exploited for ambiguity resolution and the interpretation of temporal expressions. Finally, we present a question answering system that combines all tools and techniques introduced throughout the book in a real-world application, and sketch how the presented approach can scale to larger, multi-domain scenarios in the context of the Semantic Web. Table of Contents: List of Figures / Preface / Acknowledgments / Introduction / Ontologies / Linguistic Formalisms / Ontology Lexica / Grammar Generation / Putting Everything Together / Ontological Reasoning for Ambiguity Resolution / Temporal Interpretation / Ontology-Based Interpretation for Question Answering / Conclusion / Bibliography / Authors' Biographies