The main goal of speech modification is the control of non-linguistic information of various speech signals, as, for instance, voice quality and/or voice individuality. The term speech modification is usually used to refer to the different modifications, which may be applied to the sound produced by people. (McAulay and Quatieri, 1986)
Speech modification is used in a big range of areas, such as speech synthesis, film and music industry, voice editing, dubbing, toys, chat rooms, games, communications (interpreting telephony, helium speech), voice pathology (voice restoration), high-end hearing aids, other applications (for example, confuse speaker identification systems).
The process of speech modification starts with speech production modeling and understanding and finishes with the speech perception. It performs natural language modeling, processing, and control of speaking style and ends in statistical signal processing. (Cooke, Mayo and Valentini-Botinhao, 2013).
Speech modification as an object of studying is highly interconnected with many other with other speech areas, such as speech modeling, speech coding, speech recognition, speech enhancement, speech synthesis, etc.
The ways to modify speech also come in a wide range, from the main methods, there are time-scale modification, pitch modification, speaker modification, voice alteration, voice quality control, voice morphing, voice conversion and others. (Erro, 2008).
According to the research papers speech modification results in losing voice individuality in its social and psychological (speaking style, emotional state) and physiological dimension (voice quality). Speaking style is mainly realized in such prosodic features as pitch contour, duration of words, rhythm, pauses, power levels, while voice quality is mostly rejected in the power spectrum of the glottal source signal and vocal tract filter (including nasal cavity and mouth). (Stylianou and Moulines, 1995)
Voice individuality has its own acoustic characteristics: voice source and vocal tract filter.Voice source characteristic includes glottal wave shape, fundamental frequency contour (pitch contour), average fundamental frequency, fundamental frequency fluctuations, while vocal tract filter may have different speech spectrum, formant frequencies, the shape of the spectral envelope and spectral tilt, bandwidths and formant trajectories.
One more important feature which some voices have is speech disorder. Being an important factor for indicating such diseases as vocal cord damage, brain damage, muscle weakness, respiratory weakness, strokes, polyps or nodules on the vocal cords, vocal cord paralysis, attention deficit hyperactivity disorder, oral cancer, laryngeal cancer, Huntington’s disease, dementia, amyotrophic lateral sclerosis (or Lou Gehrig’s disease), speech disorders are lost after the speech modification.
When speaking about the ways of speech modification closer, there are two main methods which should be clarified:
In time-scale modification is used the principle of changing the articulation rate, which does not affect the perceptual quality of the original speech. (Moulines and Laroche, 1995)
Pitch modification uses the principle of changing the fundamental frequency while the short-time envelope characteristics are still preserved together with the duration of the original speech. (Moulines and Laroche, 1995)
Speech modification is used very broadly, especially when the reception of the accurate message is threatened by bad listening atmosphere. Lately, scientists show their great interest in such speech modifications, the goal of which is to increase the legibility of both kinds of speech (natural and synthetic) when there is some noise during the informational message is pronounced. (Valbret, Mulines and Tubach, 1992).
In conclusion, the first large-scale open evaluations of speech modification algorithms designed to enhance intelligibility have demonstrated worthwhile gains over a relatively-clear unmodified speech baseline. Hopefully, combination of techniques or their components is possible, which will lead to bigger gains. Some other factors that may be measured in future comparisons include speech quality, perceived loudness and computational complexity.
Bibliography
Kuwabara, H. and Sagisaka, Y. , 1995. Acoustic characteristics of speaker individuality: Control and conversion, Speech Communication, vol. 16, no. 2, pp. 165-173
Moulines, E. and Laroche, J., 1995.Techniques for pitch-scale and time-scale transformation of speech. Part I - non parametric methods, Speech Communication, vol. 16
Verhelst, W. and Roelands, M., 1993. An overlap-add technique based on waveform similarity (wsola) for high quality time-scale modification of speech, Proc. ICASSP93, pp. 554-557
Stylianou, Y. , Laroche, J. and Moulines, E. , 1995. High-Quality Speech Modification based on a Harmonic + Noise Model.," Proc. EUROSPEECH
McAulay, R. and Quatieri, T., 1986. Speech analysis/synthesis based on a sinusoidal representation, IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 744-754
Kuwahara, H., 1997. Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Munich, Germany), pp. 1303-1306
Abe, M., Nakamura, S., Shikano, K., and Kuwabara, H., 1988. Voice conversion through vector quantization," Proc. ICASSP88, pp. 655-65
Valbret, H., Mulines, E. and Tubach, J., 1992. Voice transformation using PSOLA techinques, Speech Communication, vol. 11, no. 2-3, pp. 175-187
Iwahashi, N. and Sagisaka, Y. , 1994. Speech spectrum transformation based on speaker interpolation, Proc. ICASSP94
Stylianou, Y. , and Moulines, E. , 1995. Statistical methods for voice quality transformation, Proc. EUROSPEECH
Kain and Macon M., 1998. Spectral voice conversion for text-to-speech synthesis, Proc. ICASSP98, pp. 285-288
Toda, T. , Saruwatari, H. , and Shikano, K. , 2001.Voice Conversion Algorithm based on Gaussian Mixture Model with Dynamic Frequency Warping of STRAIGHT spectrum, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Salt Lake City, USA), pp. 841-844
Toda, T. , Black, A. and Tokuda, K. , 2005.Spectral Conversion Based on Maximum Likelihood Estimation considering Global Variance of Converted Parameter, Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, (Philadelphia, USA), pp. 9-12
Turk, O. and Arslan, L. M. , 2006. Robust processing techniques for voice conversion, Computer Speech and Language, vol. 20, pp. 441-467
Suendermann, D., Hoege, H., Bonafonte, A., Ney, H., Black, A. and Narayanan, S., 2006.Text-independent voice conversion based on unit selection, Proc. ICASSP06, (Toulouse, France), pp. 81-84
Mouchtaris, A., and Mueller, P. , 2006. Non parallel training for voice conversion based on parameter adaptation, IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 3, pp. 952-963
Erro, D., 2008.Intra-lingual and cross-lingual Voice Conversion using Harmonic plus Stochastic models. PhD thesis, UPC, Barcelona, Spain
Yoo, S., Boston, J., 2007. Speech signal modification to increase intelligibility in noisy environments, J. Acoust. Soc. Am., vol. 122, no. 2, pp. 1138–1149
Cooke, M., Mayo, C., Valentini-Botinhao, C., 2013. Evaluating the intelligibility benefit of speech modifications in known noise conditions, Speech Communication, vol. 55
Rothauser, E., Chapman, W. , Guttman, N. , 1969. IEEE Recommended practice for speech quality measurements, IEEE Trans. on Audio and Electroacoustics, vol. 17, pp. 225–246.
Cooke, M., Mayo, C., Valentini-Botinhao, C. Intelligibility-enhancing speech modifications: the Hurricane Challenge, Basque Foundation for Science, Bilbao, Spain, Language and Speech Laboratory, University of the Basque Country, Vitoria, Spain, Centre for Speech Technology Research, University of Edinburgh, UK