Speech Emotion Recognition based on Improved SOAR Model

Document Type : Research Article

Authors

1 Department of Computer Engineering, Rasht Branch, Islamic Azad University, Rasht, Iran.

2 Department of Computer Engineering, Fouman and Shaft Branch, Islamic Azad University, Fouman, Iran.

3 Department of Computer Engineering, Lahijan Branch, Islamic Azad University, Lahijan, Iran.

10.22108/jcs.2024.140141.1138

Abstract

In recent years, emotion recognition as a new method for human-computer interaction has attracted the attention of researchers. Automatic speech emotion recognition has become one of the practical methods to increase engagement in most industries. It is expected that emotion recognition based on audio information can result in better accuracy. The purpose of this article is to present an efficient method for recognizing emotional states from speech signals, based on a new cognitive model. Due to the importance of the topic, this article presents an efficient method for recognizing emotional states from speech signals based on a mixed deep learning and cognitive model called SOAR. To implement each part of this model, two main steps have been introduced. The first step is reading the video and converting it to images and preprocessing it. The next step is to use the combination of convolutional neural network (CNN) and learning automata (LA) to classify and detect the rate of facial emotional recognition. The reason for choosing CNN in our model is that no dimension is removed from the speech signal and considering the temporal information in dynamic speech leads to more efficient and better classification. Also, the training of the CNN network in calculating the backpropagation error is adjusted by LA so that the efficiency of the proposed model is increased and the working memory part of the SOAR model can be implemented. In the proposed model, audio databases available in the field of multimodal emotion recognition eNTERFACE' 05 and SAVEE have been used for various experiments. The recognition accuracy of the presented model in the best case from eNTERFACE' 05 and SAVEE databases is equal to 85.3% and 84.5%, respectively.

Keywords

Main Subjects


[1] Badshah, A. M. and Ahmad, J. and Rahim, N. and Baik, S. W.. Speech emotion recognition from spectrograms with deep convolutional neural network. 2017 International Conference on Platform Technology and Service (PlatCon). 1-5, IEEE. 2017. [DOI ]
[2] Cecchi, A. S.. Cognitive penetration of early vision in face perception. Consciousness and Cognition. 63: 254-266, 2018. [DOI ]
[3] Chen, Zengzhao and Lin, Mengting and Wang, Zhifeng and Zheng, Qiuyu and Liu, Chuan. Spatio-temporal representation learning enhanced speech emotion recognition with multi-head attention mechanisms. Knowledge-Based Systems. 281: 111077, 2023. [DOI ]
[4] Darekar, R. V. and Chavan, M. and Sharanyaa, S. and Ranjan, N. M.. A hybrid meta-heuristic ensemble based classification technique speech emotion recognition. Advances in Engineering Software. 180: 103412, 2023. [DOI ]
[5] de Lope, J. and Graña, M.. An ongoing review of speech emotion recognition. Neurocomputing. 528: 1-11, 2023. [DOI ]
[6] El Ayadi, M. and Kamel, M. S. and Karray, F.. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition. 44(3): 572-587, 2011. [DOI ]
[7] Falahzadeh, M. R. and Farokhi, F. and Harimi, A. and Sabbaghi-Nadooshan, R.. Deep convolutional neural network and gray wolf optimization algorithm for speech emotion recognition. Circuits, Systems, and Signal Processing. 42(1): 449-492, 2023. [DOI ]
[8] Farhoudi, Z. and Setayeshi, S. and Rabiee, A.. Using learning automata in brain emotional learning for speech emotion recognition. International Journal of Speech Technology. 20: 553-562, 2017. [DOI ]
[9] Farooq, M. and Hussain, F. and Baloch, N. K. and Raja, F. R. and Yu, H. and Zikria, Y. B.. Impact of feature selection algorithm on speech emotion recognition using deep convolutional neural network. Sensors. 20(21): 6008, 2020. [DOI ]
[10] Gan, C. and Wang, K. and Zhu, Q. and Xiang, Y. and Jain, D. K. and García, S.. Speech emotion recognition via multiple fusion under spatial–temporal parallel network. Neurocomputing. 555: 126623, 2023. [DOI ]
[11] Haq, S. and Jackson, P. J. and Edge, J.. Audio-visual feature selection and reduction for emotion classification. Proc. Int. Conf. on Auditory-Visual Speech Processing (AVSP’08). 2008.
[12] Harimi, A. and Shahzadi, A. and Ahmadyfard, A. and Yaghmaie, K.. Classification of emotional speech using spectral pattern features. Journal of AI and Data Mining. 2(1): 53-61, 2014. [DOI ]
[13] Jing, E. and Liu, Y. and Chai, Y. and Sun, J. and Samtani, S. and Jiang, Y. and Qian, Y.. A deep interpretable representation learning method for speech emotion recognition. Information Processing and Management. 60(6): 103501, 2023. [DOI ]
[14] Krizhevsky, A. and Sutskever, I. and Hinton, G. E.. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems. 25: 2012. [DOI ]
[15] Landau, M. J.. Acoustical properties of speech as indicators of depression and suicidal risk. Vanderbilt Undergraduate Research Journal. 4: 2008. [DOI ]
[16] Liu, L. Y. and Liu, W. Z. and Feng, L.. A Primary task driven adaptive loss function for multi-task speech emotion recognition. Engineering Applications of Artificial Intelligence. 127: 107286, 2024. [DOI ]
[17] Liu, Z. T. and Wu, B. H. and Han, M. T. and Cao, W. H. and Wu, M.. Speech emotion recognition based on meta-transfer learning with domain adaption. Applied Soft Computing. 147: 110766, 2023. [DOI ]
[18] Mancini, E. and Galassi, A. and Ruggeri, F. and Torroni, P.. Disruptive Situation Detection on Public Transport through Speech Emotion Recognition. Intelligent Systems with Applications. 200305, 2023. [DOI ]
[19] Nourbakhsh, A. and Moin, M. S. and Sharifi, A.. An inspired haxby brain perceptual model for facial images quality assessment. Journal of Intelligent and Fuzzy Systems. 39(6): 8543-8555, 2020. [DOI ]
[20] Rázuri, J. G. and Sundgren, D. and Rahmani, R. and Moran, A. and Bonet, I. and Larsson, A.. Speech emotion recognition in emotional feedback for human-robot interaction. International Journal of Advanced Research in Artificial Intelligence (IJARAI). 4(2): 20-27, 2015. [DOI ]
[21] Schuller, B. and Rigoll, G. and Lang, M.. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing. 1: I-577, IEEE. 2004. [DOI ]
[22] Sheikhan, M. and Bejani, M. and Gharavian, D.. Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method. Neural Computing and Applications. 23: 215-227, 2013. [DOI ]
[23] Singh, V. and Prasad, S.. Speech emotion recognition system using gender dependent convolution neural network. Procedia Computer Science. 218: 2533-2540, 2023. [DOI ]
[24] Tzirakis, P. and Zhang, J. and Schuller, B. W.. End-to-end speech emotion recognition using deep neural networks. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 5089-5093, IEEE. 2018. [DOI ]
[25] Ververidis, D. and Kotropoulos, C.. A Review of Emotional Speech Databases. 2003. [DOI ]
[26] Ververidis, D. and Kotropoulos, C.. Emotional speech recognition: Resources, features, and methods. Speech Communication. 48(9): 1162-1181, 2006. [DOI ]
[27] Winter, J. and Xu, Y. and Lee, W. C.. Energy efficient processing of k nearest neighbor queries in location-aware sensor networks. The Second Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services. 281-292, IEEE. 2005. [DOI ]
[28] Wongthanavasu, S. and Ponkaew, J.. A cellular automata-based learning method for classification. Expert Systems with Applications. 49: 99-111, 2016. [DOI ]
[29] Ramzani Shahrestani, M. and Motamed, S. and Yamaghani, M.. Recognition of facial emotion based on SOAR model. Frontiers in Neuroscience. 18: 1374112, 2024. [DOI ]
[30] Ramzani Shahrestani, M. and Motamed, S. and Yamaghani, M.. Recognition of Facial and Vocal Emotional Expressions by SOAR Model. Journal of Information Systems and Telecommunication (JIST). 3(43): 209, 2023. [DOI ]