Vocal Quality Factors: Analysis, Synthesis, and Perception

Overview

Journal J Acoust Soc Am

Publisher American Institute of Physics

Specialty Otorhinolaryngology

Date 1991 Nov 1

PMID 1837797

Citations 32

Authors

D G Childers

C K Lee

Affiliations

Soon will be listed here.

Abstract

The purpose of this study was to examine several factors of vocal quality that might be affected by changes in vocal fold vibratory patterns. Four voice types were examined: modal, vocal fry, falsetto, and breathy. Three categories of analysis techniques were developed to extract source-related features from speech and electroglottographic (EGG) signals. Four factors were found to be important for characterizing the glottal excitations for the four voice types: the glottal pulse width, the glottal pulse skewness, the abruptness of glottal closure, and the turbulent noise component. The significance of these factors for voice synthesis was studied and a new voice source model that accounted for certain physiological aspects of vocal fold motion was developed and tested using speech synthesis. Perceptual listening tests were conducted to evaluate the auditory effects of the source model parameters upon synthesized speech. The effects of the spectral slope of the source excitation, the shape of the glottal excitation pulse, and the characteristics of the turbulent noise source were considered. Applications for these research results include synthesis of natural sounding speech, synthesis and modeling of vocal disorders, and the development of speaker independent (or adaptive) speech recognition systems.

Citing Articles

Breathy Vocal Quality, Background Noise, and Hearing Loss: How Do These Adverse Conditions Affect Speech Perception by Older Adults?.

Shen J, Heller Murray E Ear Hear. 2024; 46(2):474-482.

PMID: 39494949 PMC: 11832343. DOI: 10.1097/AUD.0000000000001599.

A smart look at monitoring while drilling (MWD) and optimizing using acoustic emission technique (AET).

Khoshouei M, Bagherpour R, Yari M Sci Rep. 2024; 14(1):19766.

PMID: 39187574 PMC: 11347611. DOI: 10.1038/s41598-024-70717-8.

Consistency of the Signature of Phonotraumatic Vocal Hyperfunction Across Different Ambulatory Voice Measures.

Ghasemzadeh H, Hillman R, Mehta D J Speech Lang Hear Res. 2024; 67(7):1997-2020.

PMID: 38861454 PMC: 11253796. DOI: 10.1044/2024_JSLHR-23-00515.

Pragmatic De-Noising of Electroglottographic Signals.

Ternstrom S Bioengineering (Basel). 2024; 11(5).

PMID: 38790346 PMC: 11117636. DOI: 10.3390/bioengineering11050479.

MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.

Liang P, Lyu Y, Fan X, Wu Z, Cheng Y, Wu J Adv Neural Inf Process Syst. 2024; 2021(DB1):1-20.

PMID: 38774625 PMC: 11106632.