Probabilistic Indexing for Information Search and Retrieval in Large Collections of Handwritten Text Images

Probabilistic Indexing for Information Search and Retrieval in Large Collections of Handwritten Text Images
Title Probabilistic Indexing for Information Search and Retrieval in Large Collections of Handwritten Text Images PDF eBook
Author
Publisher Springer Nature
Pages 372
Release 2024
Genre Automatic indexing
ISBN 3031553896

Download Probabilistic Indexing for Information Search and Retrieval in Large Collections of Handwritten Text Images Book in PDF, Epub and Kindle

This book provides a comprehensive presentation of a recently introduced framework, named "probabilistic indexing" (PrIx), for searching text in large collections of document images and other related applications. It fosters the development of new search engines for effective information retrieval from manuscripts which, however, lack the electronic text (transcripts) that would typically be required for such search and retrieval tasks. The book is structured into 11 chapters and three appendices. The first two chapters briefly outline the necessary fundamentals and state of the art in pattern recognition, statistical decision theory, and handwritten text recognition. Chapter 3 presents approaches for indexing (as opposed to spotting) each region of a handwritten text image which is likely to contain a word. Next, Chapter 4 describes models adopted for handwritten text in images, namely hidden Markov models, convolutional and recurrent neural networks and language models, and provides full details of weighted finite-state transducer (WFST) concepts and methods, needed in further chapters of the book. Chapter 5 explains the set of techniques and algorithms developed to generate image probabilistic indexes which allow for fast search and retrieval of textual information in the indexed images. Chapter 6 then presents experimental evaluations of the proposed framework and algorithms on different traditional benchmark datasets and compares them with other approaches, while Chapter 7 reviews the most popular keyword-spotting approaches. Chapter 8 explains how PrIx can support classical free-text search tools, while Chapter 9 presents new methods that use PrIx not only for searching, but also to deal with text analytics and other related natural language processing and information extraction tasks. Chapter 10 shows how the proposed solutions can be used to effectively index very large collections of handwritten document images, before Chapter 11 eventually summarizes the book and suggests promising lines of future research. The appendices detail the necessary mathematical foundations for the work and presents details of the text image collections and datasets used in the experiments throughout the book. This book is written for researchers and (post-)graduate students in pattern recognition and information retrieval. It will also be of interest to people in areas like history, criminology, or psychology who need technical support to evaluate, understand or decode historical or contemporary handwritten text.

Document Analysis Systems

Document Analysis Systems
Title Document Analysis Systems PDF eBook
Author Seiichi Uchida
Publisher Springer Nature
Pages 795
Release 2022-05-17
Genre Computers
ISBN 3031065557

Download Document Analysis Systems Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 15th IAPR International Workshop on Document Analysis Systems, DAS 2022, held in La Rochelle, France, in May 2022. The full papers presented were carefully reviewed and selected from numerous submissions addressing key techniques of document analysis.

Pattern Recognition and Image Analysis

Pattern Recognition and Image Analysis
Title Pattern Recognition and Image Analysis PDF eBook
Author Armando J. Pinho
Publisher Springer Nature
Pages 704
Release 2022-04-25
Genre Computers
ISBN 3031048814

Download Pattern Recognition and Image Analysis Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 10th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2022, held in Aveiro, Portugal, in May 2022. The 54 papers accepted for these proceedings were carefully reviewed and selected from 72 submissions. They deal with document analysis; medical image processing; biometrics; pattern recognition and machine learning; computer vision; and other applications.

Pattern Recognition and Image Analysis

Pattern Recognition and Image Analysis
Title Pattern Recognition and Image Analysis PDF eBook
Author Aythami Morales
Publisher Springer Nature
Pages 534
Release 2019-09-21
Genre Computers
ISBN 3030313212

Download Pattern Recognition and Image Analysis Book in PDF, Epub and Kindle

This 2-volume set constitutes the refereed proceedings of the 9th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2019, held in Madrid, Spain, in July 2019. The 99 papers in these volumes were carefully reviewed and selected from 137 submissions. They are organized in topical sections named: Part I: best ranked papers; machine learning; pattern recognition; image processing and representation. Part II: biometrics; handwriting and document analysis; other applications.

Content-based Handwritten Document Indexing and Retrieval

Content-based Handwritten Document Indexing and Retrieval
Title Content-based Handwritten Document Indexing and Retrieval PDF eBook
Author
Publisher
Pages 121
Release 2008
Genre
ISBN

Download Content-based Handwritten Document Indexing and Retrieval Book in PDF, Epub and Kindle

Information retrieval on textual data has been well studied and its applications (such as web searching) have become ubiquitous in our daily lives. However content-based image retrieval on handwritten document collections still remains a challenging problem. Here "content-based" means that the search will analyze the actual content of the images, instead of merely the metadata. In the context of handwritten documents, the word "content" might refer different things, such as writing style, shape of words and characters, or the truth of the writing. Accordingly, two different types of retrieval can be performed: "query by example" and semantic (or "query by text") retrieval. While both of them have their own applications in the real world, the second one is more intuitive and user-friendly, since it uses not only the low level underlying computational features, but also the understanding of documents. This work explores several automatic techniques to do both types of retrieval upon handwritten document collections. These techniques are three-fold: (i) indexing, (ii) "query by example" retrieval and (iii) "query by text" retrieval. For indexing, we focus on the problem of word segmentation and transcript mapping. Word segmentation is the task of segmenting text line images into word image, which is one of the most important preprocessing steps in order to perform any word level analysis or recognition. We propose the use of neural network with a new set of global and local features to make the classification between inter-word and intra-word gaps. The transcript mapping problem is an alignment problem between the handwritten document image and its transcript. It is not a trivial task simply because the word segmentation algorithm is error prone. A recognition based dynamic programming algorithm is proposed to solve this problem. It is also shown to improve the accuracy of automatic word segmentation. In "query by example" retrieval, the query can be either a full page document or a single word image. For the document level retrieval, a statistical model is learned to determine whether the writing styles of two documents are similar or not. Gamma and Gaussian distributions are used for the modeling. Word level retrieval is performed by a feature based similarity search algorithm. For each word image, a 1024-bit binary feature vector is extracted for this purpose. "Query by text" retrieval is a more challenging task because word level segmentation is error prone and word recognition with large lexicon size is still an unsolved problem. The current solution for this problem is to manually annotate the collection, which is costly. By taking the idea from machine translation in textual information retrieval, we propose a statistical approach for word recognition and use the probabilistic annotation results to do language model retrieval on handwritten documents. For all these approaches, their performances are empirically compared on several test collections. The main contributions of this work are a detailed examination of different levels of content-based image retrieval for handwritten documents, and the development of a retrieval system that allows either image or text queries. The new word segmentation method shows an improved performance over a previous method and is useful in forensic document analysis. In addition, a large handwriting database of 3824 pages (about 573,600 labeled words) was created using the proposed transcript-mapping algorithm. This database was used predominantly in this dissertation and it serves as a useful resource for future handwriting analysis and recognition research.

Pattern Recognition and Image Analysis

Pattern Recognition and Image Analysis
Title Pattern Recognition and Image Analysis PDF eBook
Author Luís A. Alexandre
Publisher Springer
Pages 550
Release 2017-06-08
Genre Computers
ISBN 3319588389

Download Pattern Recognition and Image Analysis Book in PDF, Epub and Kindle

This book constitutes the refereed proceedings of the 8th Iberian Conference on Pattern Recognition and Image Analysis, IbPRIA 2017, held in Faro, Portugal, in June 2017. The 60 regular papers presented in this volume were carefully reviewed and selected from 86 submissions. They are organized in topical sections named: Pattern Recognition and Machine Learning; Computer Vision; Image and Signal Processing; Medical Image; and Applications.

Handwritten Historical Document Analysis, Recognition, And Retrieval - State Of The Art And Future Trends

Handwritten Historical Document Analysis, Recognition, And Retrieval - State Of The Art And Future Trends
Title Handwritten Historical Document Analysis, Recognition, And Retrieval - State Of The Art And Future Trends PDF eBook
Author Andreas Fischer
Publisher World Scientific
Pages 269
Release 2020-11-11
Genre Computers
ISBN 9811203253

Download Handwritten Historical Document Analysis, Recognition, And Retrieval - State Of The Art And Future Trends Book in PDF, Epub and Kindle

In recent years, libraries and archives all around the world have increased their efforts to digitize historical manuscripts. To integrate the manuscripts into digital libraries, pattern recognition and machine learning methods are needed to extract and index the contents of the scanned images.The unique compendium describes the outcome of the HisDoc research project, a pioneering attempt to study the whole processing chain of layout analysis, handwriting recognition, and retrieval of historical manuscripts. This description is complemented with an overview of other related research projects, in order to convey the current state of the art in the field and outline future trends.This must-have volume is a relevant reference work for librarians, archivists and computer scientists.