With the rapid development of AI, the research on speech emotion recognition based on DL is numerous. At present, speech emotion recognition technology is used and has made some achievements. Mellouk et al. (2020) studied FER (Facial Emotional Recognition) through DL, and developed technologies to explain, encode facial expressions and extract these features for better prediction via computers. The research results prove that DL is an effective algorithm. Masud et al. (2020) studied the intelligent face recognition based on DL in the IoT-cloud environment and compared the performance of the most advanced face recognition model with others. The experimental results show that the accuracy of the proposed model can reach 98.65%. Li et al. (2020) studied the method and performance of medical image fusion based on DL. The results show that the DL can automatically extract the most effective features from the data, which can be used in image fusion to improve the efficiency and accuracy of image processing, and the increase of the training data can improve the training accuracy. Xiong et al. (2021) studied plant phenotypic image recognition based on DL. CNN, deep belief network (DBN)and recurrent neural network (RNN) are used to identify plant species and diagnose plant diseases. The research shows that DL has broad application prospects and great value in the era of smart agriculture and big data. Yang et al. (2021) studied the image recognition of wind turbine blade damage of transfer learning and ensemble learning classifier and proposed a new method for blade damage detection based on DL. The performance of the proposed model is verified by using the image of wind turbine blades. The proposed model has better performance than the support vector machine (SVM), basic DL model and DL combined with ensemble learning method. In summary, deep learning has been applied to speech image recognition, and the results show that DL has great application value in speech image recognition.