Inversion of speech by non-linear transformation of temporary

Authors

DOI:

https://doi.org/10.5604/01.3001.0010.7714

Keywords:

inversion of speech, non-linear transformation of the time, mel-cepstral parameters

Abstract

Electromagnetic Articulography (EMA) is a precise method for speech articulators assessment which is carried out by sensors placed mainly on the tongue. Various methods are being developed in order to avoid the assessment by EMA sensors. One of them is speech inversion. Here preliminary research on speech inversion based on dynamic time warping (DTW) method has been described. Mel-frequency cepstral coefficients (MFCC) method has been chosen as the acoustic speech signal parametrization method. Root mean square errors (RMSE) of the evaluation have been presented and discussed.

Downloads

Download data is not yet available.

Perkell J.S., Cohen M.H., Svirsky M.A., Matthies M.L., Garabieta I., Jackson M. T. Electromagnetic midsagittal articulometer (EMMA) systems for transducing speech articulatory movements. JASA, 1992, 92(6), 3078-96.   Google Scholar

Król D., Lorenc A., Święciński R. Detecting Laterality and Nasality in Speech with the Use of a Multi-Channel Recorder. Proceedings of the 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, 5147-51.   Google Scholar

Beskow J., Engwall O., Granström B. Simultaneous measurements of facial and intraoral articulation. Proceedings of Fonetik 2003. Dept. of Linguistics, Stockholm University, 2003, 57-60.   Google Scholar

Kjellström H., Engwall O. Audiovisual to articulatory inversion. Speech Communication, 2009, 51(3), 195-209.   Google Scholar

Richmond K. Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion. Advances in Nonlinear Speech Processing, Lecture Notes in Computer Science 2007, 4885, 263-72.   Google Scholar

Hueber T., Ben Youssef A., Bailly G., Badin P., Eliséi F. Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training. Proceedings of Interspeech, Portland, USA, 2012.   Google Scholar

Makowski R., Świętojański P., Wielgat R. Automatyczne rozpoznawanie mowy. Chapter 14 In book: Cyfrowe Przetwarzanie Sygnałów w Telekomunikacji. Podstawy, multimedia, transmisja. Publisher: Wydawnictwo Naukowe PWN - Red: Zielinski, T., Korohoda, P., Rumian, R. 2014, 522-30.   Google Scholar

Mik Ł., Wielgat R., Lorenc A., Król D., Święciński R., Jędryka R. Multimodal Speech Data Acquisition with the Use of EMA Fast-speed Video Cameras and a Dedicated Microphone Array. 23rd International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), Łódź, Poland, June 2016.   Google Scholar

Rabiner L.R., Rosenberg A., Levinson S. Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition”, IEEE Trans. Acoust., Speech, Signal Processing, 1978, 26, 575-82.   Google Scholar

Kuhn M.H., Tomaschewski H.H. Improvements in Isolated Word Recognition. IEEE Trans. Acoust., Speech, Signal Processing, 1983, 31(1), 157-67.   Google Scholar

Lorenc A. Wymowa normatywna polskich samogłosek nosowych i spółgłoski bocznej, (rozdział 4.4). Dom wydawniczy ELIPSA, Warszawa 2016.   Google Scholar

Published

2016-12-30

How to Cite

Wielgat, R., & Lorenc, A. (2016). Inversion of speech by non-linear transformation of temporary. Health Promotion & Physical Activity, (1), 139–150. https://doi.org/10.5604/01.3001.0010.7714