Abstract
Communication impairments have a broad spectrum of medical causes, such as speech disorders, hearing loss, brain injury, stroke, and physical impairments. As a result, communication disorders can affect social development and interpersonal relationships. Speech impairments can benefit from early speech treatments; however, the majority of rehab facilities across the world still carry out this process manually. A wide range of studies has been conducted on speech processing for various human languages. Machine learning and deep learning have been applied to the medical and healthcare industry to enhance rehabilitation by utilizing the new technology. This study analyzed the classification accuracy of the designed network and other pre-trained models (VGG-Net, AlexNet, and Inception) and performed a complete comparative analysis to assess the classification accuracy of several pre-trained models. The sound is converted to the image as a new way to see them in the neural network via a newly proposed concept named image-profiled data. These image-profiled datasets that used a spectrogram and a Mel-frequency cepstral coefficient (MFCC) produced this study's best results and accuracy. This project aims to develop a new neural network that can successfully distinguish between the vowels from the voices of normal people, patients with speech disorders and the mix from the prior two groups using the six and twelve classes of Malay vowels. The designed network model, which used 6 batch sizes, 20 epochs, and ADAM as the optimizer, this study presented and achieved the maximum accuracy values of both classes for image-profiled audio data in analyses conducted.
Keywords
Convolutional neural network (CNN), Deep learning, Mel-frequency cepstral coefficient (MFCC), Rehabilitation, Spectrogram, Vowel recognition
Subject Area
Computer Science
First Page
319
Last Page
334
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Receive Date
6-8-2023
Revise Date
2-2-2024
Accept Date
2-4-2024
How to Cite this Article
Azhar, Nur Syahmina Ahmad; Hashim, Nik Mohd Zarifie; Ibrahim, Masrullizam Mat; and Sulistiyo, Mahmud Dwi
(2025)
"Vowel Recognition for Rehabilitation Assessment of Speech Disorder Patients via Multi-source Frequency Spectrum Images,"
Baghdad Science Journal: Vol. 22:
Iss.
1, Article 28.
DOI: 10.21123/bsj.2024.9202
Available at:
https://bsj.researchcommons.org/home/vol22/iss1/28