台語文處理技術:以變調及詞性標記為例 Processing Techniques for Written Taiwanese -- Tone Sandhi and POS Tagging
Title | 台語文處理技術:以變調及詞性標記為例 Processing Techniques for Written Taiwanese -- Tone Sandhi and POS Tagging PDF eBook |
Author | |
Publisher | Ungian Iunn 楊允言 |
Pages | 167 |
Release | |
Genre | |
ISBN |
Processing Techniques for Written Taiwanese
Title | Processing Techniques for Written Taiwanese PDF eBook |
Author | 楊允言 |
Publisher | |
Pages | 139 |
Release | 2009 |
Genre | |
ISBN |
Chinese Spoken Language Processing
Title | Chinese Spoken Language Processing PDF eBook |
Author | Qiang Huo |
Publisher | Springer |
Pages | 825 |
Release | 2006-11-30 |
Genre | Computers |
ISBN | 3540496661 |
This book constitutes the thoroughly refereed proceedings of the 5th International Symposium on Chinese Spoken Language Processing, ISCSLP 2006, held in Singapore in December 2006, co-located with ICCPOL 2006, the 21st International Conference on Computer Processing of Oriental Languages. Coverage includes speech science, acoustic modeling for automatic speech recognition, speech data mining, and machine translation of speech.
Weakly Supervised Part-of-speech Tagging for Chinese Using Label Propagation
Title | Weakly Supervised Part-of-speech Tagging for Chinese Using Label Propagation PDF eBook |
Author | Weiwei Ding |
Publisher | |
Pages | 118 |
Release | 2011 |
Genre | |
ISBN |
Part-of-speech (POS) tagging is one of the most fundamental and crucial tasks in Natural Language Processing. Chinese POS tagging is challenging because it also involves word segmentation. In this report, research will be focused on how to improve unsupervised Part-of-Speech (POS) tagging using Hidden Markov Models and the Expectation Maximization parameter estimation approach (EM-HMM). The traditional EM-HMM system uses a dictionary, which is used to constrain possible tag sequences and initialize the model parameters. This is a very crude initialization: the emission parameters are set uniformly in accordance with the tag dictionary. To improve this, word alignments can be used. Word alignments are the word-level translation correspondent pairs generated from parallel text between two languages. In this report, Chinese-English word alignment is used. The performance is expected to be better, as these two tasks are complementary to each other. The dictionary provides information on word types, while word alignment provides information on word tokens. However, it is found to be of limited benefit. In this report, another method is proposed. To improve the dictionary coverage and get better POS distribution, Modified Adsorption, a label propagation algorithm is used. We construct a graph connecting word tokens to feature types (such as word unigrams and bigrams) and connecting those tokens to information from knowledge sources, such as a small tag dictionary, Wiktionary, and word alignments. The core idea is to use a small amount of supervision, in the form of a tag dictionary and acquire POS distributions for each word (both known and unknown) and provide this as an improved initialization for EM learning for HMM. We find this strategy to work very well, especially when we have a small tag dictionary. Label propagation provides a better initialization for the EM-HMM method, because it greatly increases the coverage of the dictionary. In addition, label propagation is quite flexible to incorporate many kinds of knowledge. However, results also show that some resources, such as the word alignments, are not easily exploited with label propagation.
Developing Linguistic Corpora
Title | Developing Linguistic Corpora PDF eBook |
Author | Martin Wynne |
Publisher | Oxbow Books Limited |
Pages | 100 |
Release | 2005 |
Genre | Language Arts & Disciplines |
ISBN |
A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.
Handbook of Natural Language Processing
Title | Handbook of Natural Language Processing PDF eBook |
Author | Nitin Indurkhya |
Publisher | CRC Press |
Pages | 704 |
Release | 2010-02-22 |
Genre | Business & Economics |
ISBN | 142008593X |
The Handbook of Natural Language Processing, Second Edition presents practical tools and techniques for implementing natural language processing in computer systems. Along with removing outdated material, this edition updates every chapter and expands the content to include emerging areas, such as sentiment analysis.New to the Second EditionGreater
Introduction to Embedded Systems, Second Edition
Title | Introduction to Embedded Systems, Second Edition PDF eBook |
Author | Edward Ashford Lee |
Publisher | MIT Press |
Pages | 562 |
Release | 2017-01-06 |
Genre | Computers |
ISBN | 0262340526 |
An introduction to the engineering principles of embedded systems, with a focus on modeling, design, and analysis of cyber-physical systems. The most visible use of computers and software is processing information for human consumption. The vast majority of computers in use, however, are much less visible. They run the engine, brakes, seatbelts, airbag, and audio system in your car. They digitally encode your voice and construct a radio signal to send it from your cell phone to a base station. They command robots on a factory floor, power generation in a power plant, processes in a chemical plant, and traffic lights in a city. These less visible computers are called embedded systems, and the software they run is called embedded software. The principal challenges in designing and analyzing embedded systems stem from their interaction with physical processes. This book takes a cyber-physical approach to embedded systems, introducing the engineering concepts underlying embedded systems as a technology and as a subject of study. The focus is on modeling, design, and analysis of cyber-physical systems, which integrate computation, networking, and physical processes. The second edition offers two new chapters, several new exercises, and other improvements. The book can be used as a textbook at the advanced undergraduate or introductory graduate level and as a professional reference for practicing engineers and computer scientists. Readers should have some familiarity with machine structures, computer programming, basic discrete mathematics and algorithms, and signals and systems.