Inversion of speech by non-linear transformation of temporary
DOI:
https://doi.org/10.5604/01.3001.0010.7714Keywords:
inversion of speech, non-linear transformation of the time, mel-cepstral parametersAbstract
Electromagnetic Articulography (EMA) is a precise method for speech articulators assessment which is carried out by sensors placed mainly on the tongue. Various methods are being developed in order to avoid the assessment by EMA sensors. One of them is speech inversion. Here preliminary research on speech inversion based on dynamic time warping (DTW) method has been described. Mel-frequency cepstral coefficients (MFCC) method has been chosen as the acoustic speech signal parametrization method. Root mean square errors (RMSE) of the evaluation have been presented and discussed.
Downloads
References
Perkell J.S., Cohen M.H., Svirsky M.A., Matthies M.L., Garabieta I., Jackson M. T. Electromagnetic midsagittal articulometer (EMMA) systems for transducing speech articulatory movements. JASA, 1992, 92(6), 3078-96. Google Scholar
Król D., Lorenc A., Święciński R. Detecting Laterality and Nasality in Speech with the Use of a Multi-Channel Recorder. Proceedings of the 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, 5147-51. Google Scholar
Beskow J., Engwall O., Granström B. Simultaneous measurements of facial and intraoral articulation. Proceedings of Fonetik 2003. Dept. of Linguistics, Stockholm University, 2003, 57-60. Google Scholar
Kjellström H., Engwall O. Audiovisual to articulatory inversion. Speech Communication, 2009, 51(3), 195-209. Google Scholar
Richmond K. Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion. Advances in Nonlinear Speech Processing, Lecture Notes in Computer Science 2007, 4885, 263-72. Google Scholar
Hueber T., Ben Youssef A., Bailly G., Badin P., Eliséi F. Cross-speaker Acoustic-to-Articulatory Inversion using Phone-based Trajectory HMM for Pronunciation Training. Proceedings of Interspeech, Portland, USA, 2012. Google Scholar
Makowski R., Świętojański P., Wielgat R. Automatyczne rozpoznawanie mowy. Chapter 14 In book: Cyfrowe Przetwarzanie Sygnałów w Telekomunikacji. Podstawy, multimedia, transmisja. Publisher: Wydawnictwo Naukowe PWN - Red: Zielinski, T., Korohoda, P., Rumian, R. 2014, 522-30. Google Scholar
Mik Ł., Wielgat R., Lorenc A., Król D., Święciński R., Jędryka R. Multimodal Speech Data Acquisition with the Use of EMA Fast-speed Video Cameras and a Dedicated Microphone Array. 23rd International Conference Mixed Design of Integrated Circuits and Systems (MIXDES), Łódź, Poland, June 2016. Google Scholar
Rabiner L.R., Rosenberg A., Levinson S. Considerations in Dynamic Time Warping Algorithms for Discrete Word Recognition”, IEEE Trans. Acoust., Speech, Signal Processing, 1978, 26, 575-82. Google Scholar
Kuhn M.H., Tomaschewski H.H. Improvements in Isolated Word Recognition. IEEE Trans. Acoust., Speech, Signal Processing, 1983, 31(1), 157-67. Google Scholar
Lorenc A. Wymowa normatywna polskich samogłosek nosowych i spółgłoski bocznej, (rozdział 4.4). Dom wydawniczy ELIPSA, Warszawa 2016. Google Scholar
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2016 University of Applied Sciences in Tarnow, Poland & Authors
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.