Speech Enhancement in the Karhunen-Loeve Expansion Domain
Title | Speech Enhancement in the Karhunen-Loeve Expansion Domain PDF eBook |
Author | Jacob Benesty |
Publisher | Springer Nature |
Pages | 102 |
Release | 2022-05-31 |
Genre | Technology & Engineering |
ISBN | 3031025601 |
This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations. Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired noise are filtered at the same time, the most critical issue of speech enhancement resides in how to design a proper optimal filter that can fully take advantage of the difference between the speech and noise statistics to mitigate the noise effect as much as possible while maintaining the speech perception identical to its original form. The optimal filters can be designed either in the time domain or in a transform space. As the title indicates, this book will focus on developing and analyzing optimal filters in the Karhunen-Loève expansion (KLE) domain. We begin by describing the basic problem of speech enhancement and the fundamental principles to solve it in the time domain. We then explain how the problem can be equivalently formulated in the KLE domain. Next, we divide the general problem in the KLE domain into four groups, depending on whether interframe and interband information is accounted for, leading to four linear models for speech enhancement in the KLE domain. For each model, we introduce signal processing measures to quantify the performance of speech enhancement, discuss the formation of different cost functions, and address the optimization of these cost functions for the derivation of different optimal filters. Both theoretical analysis and experiments will be provided to study the performance of these filters and the links between the KLE-domain and time-domain optimal filters will be examined. Table of Contents: Introduction / Problem Formulation / Optimal Filters in the Time Domain / Linear Models for Signal Enhancement in the KLE Domain / Optimal Filters in the KLE Domain with Model 1 / Optimal Filters in the KLE Domain with Model 2 / Optimal Filters in the KLE Domain with Model 3 / Optimal Filters in the KLE Domain with Model 4 / Experimental Study
Speech Enhancement in the Karhunen-Loève Expansion Domain
Title | Speech Enhancement in the Karhunen-Loève Expansion Domain PDF eBook |
Author | Jacob Benesty |
Publisher | Morgan & Claypool Publishers |
Pages | 113 |
Release | 2011 |
Genre | Computers |
ISBN | 1608456048 |
This book is devoted to the study of the problem of speech enhancement whose objective is the recovery of a signal of interest (i.e., speech) from noisy observations. Typically, the recovery process is accomplished by passing the noisy observations through a linear filter (or a linear transformation). Since both the desired speech and undesired noise are filtered at the same time, the most critical issue of speech enhancement resides in how to design a proper optimal filter that can fully take advantage of the difference between the speech and noise statistics to mitigate the noise effect as much as possible while maintaining the speech perception identical to its original form. The optimal filters can be designed either in the time domain or in a transform space. As the title indicates, this book will focus on developing and analyzing optimal filters in the Karhunen-Loève expansion (KLE) domain. We begin by describing the basic problem of speech enhancement and the fundamental principles to solve it in the time domain. We then explain how the problem can be equivalently formulated in the KLE domain. Next, we divide the general problem in the KLE domain into four groups, depending on whether interframe and interband information is accounted for, leading to four linear models for speech enhancement in the KLE domain. For each model, we introduce signal processing measures to quantify the performance of speech enhancement, discuss the formation of different cost functions, and address the optimization of these cost functions for the derivation of different optimal filters. Both theoretical analysis and experiments will be provided to study the performance of these filters and the links between the KLE-domain and time-domain optimal filters will be examined. Table of Contents: Introduction / Problem Formulation / Optimal Filters in the Time Domain / Linear Models for Signal Enhancement in the KLE Domain / Optimal Filters in the KLE Domain with Model 1 / Optimal Filters in the KLE Domain with Model 2 / Optimal Filters in the KLE Domain with Model 3 / Optimal Filters in the KLE Domain with Model 4 / Experimental Study
A Perspective on Single-Channel Frequency-Domain Speech Enhancement
Title | A Perspective on Single-Channel Frequency-Domain Speech Enhancement PDF eBook |
Author | Jacob Benesty |
Publisher | Springer Nature |
Pages | 101 |
Release | 2022-05-31 |
Genre | Technology & Engineering |
ISBN | 303102561X |
This book focuses on a class of single-channel noise reduction methods that are performed in the frequency domain via the short-time Fourier transform (STFT). The simplicity and relative effectiveness of this class of approaches make them the dominant choice in practical systems. Even though many popular algorithms have been proposed through more than four decades of continuous research, there are a number of critical areas where our understanding and capabilities still remain quite rudimentary, especially with respect to the relationship between noise reduction and speech distortion. All existing frequency-domain algorithms, no matter how they are developed, have one feature in common: the solution is eventually expressed as a gain function applied to the STFT of the noisy signal only in the current frame. As a result, the narrowband signal-to-noise ratio (SNR) cannot be improved, and any gains achieved in noise reduction on the fullband basis come with a price to pay, which is speech distortion. In this book, we present a new perspective on the problem by exploiting the difference between speech and typical noise in circularity and interframe self-correlation, which were ignored in the past. By gathering the STFT of the microphone signal of the current frame, its complex conjugate, and the STFTs in the previous frames, we construct several new, multiple-observation signal models similar to a microphone array system: there are multiple noisy speech observations, and their speech components are correlated but not completely coherent while their noise components are presumably uncorrelated. Therefore, the multichannel Wiener filter and the minimum variance distortionless response (MVDR) filter that were usually associated with microphone arrays will be developed for single-channel noise reduction in this book. This might instigate a paradigm shift geared toward speech distortionless noise reduction techniques. Table of Contents: Introduction / Problem Formulation / Performance Measures / Linear and Widely Linear Models / Optimal Filters with Model 1 / Optimal Filters with Model 2 / Optimal Filters with Model 3 / Optimal Filters with Model 4 / Experimental Study
Speech Enhancement in the STFT Domain
Title | Speech Enhancement in the STFT Domain PDF eBook |
Author | Jacob Benesty |
Publisher | Springer Science & Business Media |
Pages | 112 |
Release | 2011-09-18 |
Genre | Technology & Engineering |
ISBN | 3642232507 |
This work addresses this problem in the short-time Fourier transform (STFT) domain. We divide the general problem into five basic categories depending on the number of microphones being used and whether the interframe or interband correlation is considered. The first category deals with the single-channel problem where STFT coefficients at different frames and frequency bands are assumed to be independent. In this case, the noise reduction filter in each frequency band is basically a real gain. Since a gain does not improve the signal-to-noise ratio (SNR) for any given subband and frame, the noise reduction is basically achieved by liftering the subbands and frames that are less noisy while weighing down on those that are more noisy. The second category also concerns the single-channel problem. The difference is that now the interframe correlation is taken into account and a filter is applied in each subband instead of just a gain. The advantage of using the interframe correlation is that we can improve not only the long-time fullband SNR, but the frame-wise subband SNR as well. The third and fourth classes discuss the problem of multichannel noise reduction in the STFT domain with and without interframe correlation, respectively. In the last category, we consider the interband correlation in the design of the noise reduction filters. We illustrate the basic principle for the single-channel case as an example, while this concept can be generalized to other scenarios. In all categories, we propose different optimization cost functions from which we derive the optimal filters and we also define the performance measures that help analyzing them.
DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement
Title | DFT-Domain Based Single-Microphone Noise Reduction for Speech Enhancement PDF eBook |
Author | Richard C. Hendriks |
Publisher | Springer Nature |
Pages | 70 |
Release | 2022-05-31 |
Genre | Technology & Engineering |
ISBN | 3031025644 |
As speech processing devices like mobile phones, voice controlled devices, and hearing aids have increased in popularity, people expect them to work anywhere and at any time without user intervention. However, the presence of acoustical disturbances limits the use of these applications, degrades their performance, or causes the user difficulties in understanding the conversation or appreciating the device. A common way to reduce the effects of such disturbances is through the use of single-microphone noise reduction algorithms for speech enhancement. The field of single-microphone noise reduction for speech enhancement comprises a history of more than 30 years of research. In this survey, we wish to demonstrate the significant advances that have been made during the last decade in the field of discrete Fourier transform domain-based single-channel noise reduction for speech enhancement.Furthermore, our goal is to provide a concise description of a state-of-the-art speech enhancement system, and demonstrate the relative importance of the various building blocks of such a system. This allows the non-expert DSP practitioner to judge the relevance of each building block and to implement a close-to-optimal enhancement system for the particular application at hand. Table of Contents: Introduction / Single Channel Speech Enhancement: General Principles / DFT-Based Speech Enhancement Methods: Signal Model and Notation / Speech DFT Estimators / Speech Presence Probability Estimation / Noise PSD Estimation / Speech PSD Estimation / Performance Evaluation Methods / Simulation Experiments with Single-Channel Enhancement Systems / Future Directions
Acoustical Impulse Response Functions of Music Performance Halls
Title | Acoustical Impulse Response Functions of Music Performance Halls PDF eBook |
Author | Douglas Frey |
Publisher | Springer Nature |
Pages | 102 |
Release | 2022-05-31 |
Genre | Technology & Engineering |
ISBN | 3031025652 |
Digital measurement of the analog acoustical parameters of a music performance hall is difficult. The aim of such work is to create a digital acoustical derivation that is an accurate numerical representation of the complex analog characteristics of the hall. The present study describes the exponential sine sweep (ESS) measurement process in the derivation of an acoustical impulse response function (AIRF) of three music performance halls in Canada. It examines specific difficulties of the process, such as preventing the external effects of the measurement transducers from corrupting the derivation, and provides solutions, such as the use of filtering techniques in order to remove such unwanted effects. In addition, the book presents a novel method of numerical verification through mean-squared error (MSE) analysis in order to determine how accurately the derived AIRF represents the acoustical behavior of the actual hall.
Speech Recognition Algorithms Using Weighted Finite-State Transducers
Title | Speech Recognition Algorithms Using Weighted Finite-State Transducers PDF eBook |
Author | Takaaki Hori |
Publisher | Springer Nature |
Pages | 161 |
Release | 2022-05-31 |
Genre | Technology & Engineering |
ISBN | 3031025628 |
This book introduces the theory, algorithms, and implementation techniques for efficient decoding in speech recognition mainly focusing on the Weighted Finite-State Transducer (WFST) approach. The decoding process for speech recognition is viewed as a search problem whose goal is to find a sequence of words that best matches an input speech signal. Since this process becomes computationally more expensive as the system vocabulary size increases, research has long been devoted to reducing the computational cost. Recently, the WFST approach has become an important state-of-the-art speech recognition technology, because it offers improved decoding speed with fewer recognition errors compared with conventional methods. However, it is not easy to understand all the algorithms used in this framework, and they are still in a black box for many people. In this book, we review the WFST approach and aim to provide comprehensive interpretations of WFST operations and decoding algorithms to help anyone who wants to understand, develop, and study WFST-based speech recognizers. We also mention recent advances in this framework and its applications to spoken language processing. Table of Contents: Introduction / Brief Overview of Speech Recognition / Introduction to Weighted Finite-State Transducers / Speech Recognition by Weighted Finite-State Transducers / Dynamic Decoders with On-the-fly WFST Operations / Summary and Perspective