Speech production is a complex process involving multiple systems, including cognitive, muscular, and respiratory systems. Perfect synchrony among these systems is essential; any lapse in the synchrony would lead to a disorder manifested in one's speech. Thus, speech is a good pathological indicator. Respiration is an essential and primary mechanism in speech production. We first inhale a gulp of air and then produce speech while exhaling. When we run out of breath, we stop speaking and inhale. Though this process is involuntary, speech production involves a systematic outflow of air during exhalation characterized by linguistic content and prosodic characteristics of the utterance. Modeling the relationship between speech and respiration makes sensing respiratory dynamics directly from the speech plausible. Modeling such a relationship is not easy and direct because of the complex nature of speech and respiration. However, machine learning and deep learning architectures enable us to model such complex relationships.
In this thesis, we conduct a comprehensive study to establish the relationship between speech and respiration. We explore techniques for sensing breathing signals and breathing parameters from speech using deep learning architectures and address the challenges involved in establishing the practical purpose of this technology.
Our main conclusion is that breathing patterns might give us information about the respiration rate, breathing capacity and thus enable us to understand the pathological condition of a person using speech conversations. This would help early and remote diagnosis of various health conditions. Estimating breathing signal and parameters from the speech signal is an unobtrusive and potentially cost-effective option for long-term breathing monitoring in telehealthcare applications.