Emotional speech recognition (ESR) has significant role in human-computer interaction. ESR methodology involves audio segmentation for selecting units to analyze, extract features relevant to emotion, and finally perform a classification process. Previous research assumed that a single utterance was the unit of analysis. They believed that the emotional state remained constant during the utterance, even though the emotional state could change over time, even within a single utterance. As a result, using an utterance as a single unit is ineffective for this purpose. The study’s goal is to discover a new voiced unit that can be utilized to improve ESR accuracy. Several voiced units based on voiced segments were investigated. To determine the best-voiced unit, each unit is evaluated using an ESR based on a support vector machine classifier. The proposed method was validated using three datasets: EMO-DB, EMOVO, and SAVEE. Experimental results revealed that a voiced unit with five-voiced segments has the highest recognition rate. The emotional state of the overall utterance is decided by a majority vote of its parts’ emotional states. The proposed method outperforms the traditional method in terms of classification outcomes. EMO-DB, EMOVO, and SAVEE improve their recognition rates by 12%, 27%, and 23%, respectively.
Elbarougy, Reda; M El-Badry, Noha; and Nagy ElBedwehy, Mona
"An Improved Speech Emotion Classification Approach Based on Optimal Voiced Unit,"
Information Sciences Letters: Vol. 11
, PP -.
Available at: https://digitalcommons.aaru.edu.jo/isl/vol11/iss4/1