Deep Learning Based Speech Quality Prediction

Deep Learning Based Speech Quality Prediction
Title Deep Learning Based Speech Quality Prediction PDF eBook
Author Gabriel Mittag
Publisher Springer Nature
Pages 171
Release 2022-02-24
Genre Technology & Engineering
ISBN 3030914798

Download Deep Learning Based Speech Quality Prediction Book in PDF, Epub and Kindle

This book presents how to apply recent machine learning (deep learning) methods for the task of speech quality prediction. The author shows how recent advancements in machine learning can be leveraged for the task of speech quality prediction and provides an in-depth analysis of the suitability of different deep learning architectures for this task. The author then shows how the resulting model outperforms traditional speech quality models and provides additional information about the cause of a quality impairment through the prediction of the speech quality dimensions of noisiness, coloration, discontinuity, and loudness.

Machine Learning Based Speech Quality Prediction

Machine Learning Based Speech Quality Prediction
Title Machine Learning Based Speech Quality Prediction PDF eBook
Author Gabriel Mittag
Publisher
Pages
Release 2022
Genre
ISBN

Download Machine Learning Based Speech Quality Prediction Book in PDF, Epub and Kindle

New Era for Robust Speech Recognition

New Era for Robust Speech Recognition
Title New Era for Robust Speech Recognition PDF eBook
Author Shinji Watanabe
Publisher Springer
Pages 433
Release 2017-10-30
Genre Computers
ISBN 331964680X

Download New Era for Robust Speech Recognition Book in PDF, Epub and Kindle

This book covers the state-of-the-art in deep neural-network-based methods for noise robustness in distant speech recognition applications. It provides insights and detailed descriptions of some of the new concepts and key technologies in the field, including novel architectures for speech enhancement, microphone arrays, robust features, acoustic model adaptation, training data augmentation, and training criteria. The contributed chapters also include descriptions of real-world applications, benchmark tools and datasets widely used in the field. This book is intended for researchers and practitioners working in the field of speech processing and recognition who are interested in the latest deep learning techniques for noise robustness. It will also be of interest to graduate students in electrical engineering or computer science, who will find it a useful guide to this field of research.

Speech Enhancement with Improved Deep Learning Methods

Speech Enhancement with Improved Deep Learning Methods
Title Speech Enhancement with Improved Deep Learning Methods PDF eBook
Author Mojtaba Hasannezhad
Publisher
Pages 0
Release 2021
Genre
ISBN

Download Speech Enhancement with Improved Deep Learning Methods Book in PDF, Epub and Kindle

In real-world environments, speech signals are often corrupted by ambient noises during their acquisition, leading to degradation of quality and intelligibility of the speech for a listener. As one of the central topics in the speech processing area, speech enhancement aims to recover clean speech from such a noisy mixture. Many traditional speech enhancement methods designed based on statistical signal processing have been proposed and widely used in the past. However, the performance of these methods was limited and thus failed in sophisticated acoustic scenarios. Over the last decade, deep learning as a primary tool to develop data-driven information systems has led to revolutionary advances in speech enhancement. In this context, speech enhancement is treated as a supervised learning problem, which does not suffer from issues faced by traditional methods. This supervised learning problem has three main components: input features, learning machine, and training target. In this thesis, various deep learning architectures and methods are developed to deal with the current limitations of these three components. First, we propose a serial hybrid neural network model integrating a new low-complexity fully-convolutional convolutional neural network (CNN) and a long short-term memory (LSTM) network to estimate a phase-sensitive mask for speech enhancement. Instead of using traditional acoustic features as the input of the model, a CNN is employed to automatically extract sophisticated speech features that can maximize the performance of a model. Then, an LSTM network is chosen as the learning machine to model strong temporal dynamics of speech. The model is designed to take full advantage of the temporal dependencies and spectral correlations present in the input speech signal while keeping the model complexity low. Also, an attention technique is embedded to recalibrate the useful CNN-extracted features adaptively. Through extensive comparative experiments, we show that the proposed model significantly outperforms some known neural network-based speech enhancement methods in the presence of highly non-stationary noises, while it exhibits a relatively small number of model parameters compared to some commonly employed DNN-based methods. Most of the available approaches for speech enhancement using deep neural networks face a number of limitations: they do not exploit the information contained in the phase spectrum, while their high computational complexity and memory requirements make them unsuited for real-time applications. Hence, a new phase-aware composite deep neural network is proposed to address these challenges. Specifically, magnitude processing with spectral mask and phase reconstruction using phase derivative are proposed as key subtasks of the new network to simultaneously enhance the magnitude and phase spectra. Besides, the neural network is meticulously designed to take advantage of strong temporal and spectral dependencies of speech, while its components perform independently and in parallel to speed up the computation. The advantages of the proposed PACDNN model over some well-known DNN-based SE methods are demonstrated through extensive comparative experiments. Considering that some acoustic scenarios could be better handled using a number of low-complexity sub-DNNs, each specifically designed to perform a particular task, we propose another very low complexity and fully convolutional framework, performing speech enhancement in short-time modified discrete cosine transform (STMDCT) domain. This framework is made up of two main stages: classification and mapping. In the former stage, a CNN-based network is proposed to classify the input speech based on its utterance-level attributes, i.e., signal-to-noise ratio and gender. In the latter stage, four well-trained CNNs specialized for different specific and simple tasks transform the STMDCT of noisy input speech to the clean one. Since this framework is designed to perform in the STMDCT domain, there is no need to deal with the phase information, i.e., no phase-related computation is required. Moreover, the training target length is only one-half of those in the previous chapters, leading to lower computational complexity and less demand for the mapping CNNs. Although there are multiple branches in the model, only one of the expert CNNs is active for each time, i.e., the computational burden is related only to a single branch at anytime. Also, the mapping CNNs are fully convolutional, and their computations are performed in parallel, thus reducing the computational time. Moreover, this proposed framework reduces the latency by %55 compared to the models in the previous chapters. Through extensive experimental studies, it is shown that the MBSE framework not only gives a superior speech enhancement performance but also has a lower complexity compared to some existing deep learning-based methods.

Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments

Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments
Title Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments PDF eBook
Author Xiao-Lei Zhang
Publisher Elsevier
Pages 282
Release 2024-09-04
Genre Computers
ISBN 0443248575

Download Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments Book in PDF, Epub and Kindle

Speech Signal Processing Based on Deep Learning in Complex Acoustic Environments provides a detailed discussion of deep learning-based robust speech processing and its applications. The book begins by looking at the basics of deep learning and common deep network models, followed by front-end algorithms for deep learning-based speech denoising, speech detection, single-channel speech enhancement multi-channel speech enhancement, multi-speaker speech separation, and the applications of deep learning-based speech denoising in speaker verification and speech recognition. - Provides a comprehensive introduction to the development of deep learning-based robust speech processing - Covers speech detection, speech enhancement, dereverberation, multi-speaker speech separation, robust speaker verification, and robust speech recognition - Focuses on a historical overview and then covers methods that demonstrate outstanding performance in practical applications

Deep Learning Approaches for Spoken and Natural Language Processing

Deep Learning Approaches for Spoken and Natural Language Processing
Title Deep Learning Approaches for Spoken and Natural Language Processing PDF eBook
Author Virender Kadyan
Publisher Springer Nature
Pages 171
Release 2022-01-01
Genre Technology & Engineering
ISBN 3030797783

Download Deep Learning Approaches for Spoken and Natural Language Processing Book in PDF, Epub and Kindle

This book provides insights into how deep learning techniques impact language and speech processing applications. The authors discuss the promise, limits and the new challenges in deep learning. The book covers the major differences between the various applications of deep learning and the classical machine learning techniques. The main objective of the book is to present a comprehensive survey of the major applications and research oriented articles based on deep learning techniques that are focused on natural language and speech signal processing. The book is relevant to academicians, research scholars, industrial experts, scientists and post graduate students working in the field of speech signal and natural language processing and would like to add deep learning to enhance capabilities of their work. Discusses current research challenges and future perspective about how deep learning techniques can be applied to improve NLP and speech processing applications; Presents and escalates the research trends and future direction of language and speech processing; Includes theoretical research, experimental results, and applications of deep learning.

Speech and Computer

Speech and Computer
Title Speech and Computer PDF eBook
Author Alexey Karpov
Publisher Springer Nature
Pages 587
Release 2023-12-23
Genre Computers
ISBN 303148312X

Download Speech and Computer Book in PDF, Epub and Kindle

The two-volume proceedings set LNAI 14338 and 14339 constitutes the refereed proceedings of the 25th International Conference on Speech and Computer, SPECOM 2023, held in Dharwad, India, during November 29–December 2, 2023. The 94 papers included in these proceedings were carefully reviewed and selected from 174 submissions. They focus on all aspects of speech science and technology: ​automatic speech recognition; computational paralinguistics; digital signal processing; speech prosody; natural language processing; child speech processing; speech processing for medicine; industrial speech and language technology; speech technology for under-resourced languages; speech analysis and synthesis; speaker and language identification, verification and diarization.