0% Complete
English
صفحه اصلی
/
چهاردهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Enhancing Supervised Learning in Speech Emotion Recognition through Unsupervised Representations
نویسندگان :
Niloufar Faridani
1
Amirali Soltani Tehrani
2
Ramin Toosi
3
1- دانشکده برق و کامپیوتر دانشگاه تهران
2- دانشکده برق و کامپیوتر دانشگاه تهران
3- دانشکده برق و کامپیوتر دانشگاه تهران
کلمات کلیدی :
Speech Emotion Recognition،Self-supervised Learning،Convolutional Neural Network
چکیده :
Speech Emotion Recognition (SER) is pivotal in enhancing human-computer interaction by enabling a deeper understanding of emotional states across various applications, contributing to more empathetic and effective communication. This study proposes an innovative approach integrating self-supervised feature extraction with supervised classification for emotion recognition from small audio segments. In the preprocessing step, to eliminate the need to craft audio features, we employed a self-supervised feature extractor based on the Wav2Vec model to capture acoustic features from audio data. Then, the output feature maps of the preprocessing step are fed to a custom-designed Convolutional Neural Network (CNN)–-based model to perform emotion classification. Utilizing the ShEMO dataset as our testing ground, the proposed method surpasses two baseline methods, i.e., support vector machine classifier and transfer learning of a pre-trained CNN. Comparing the proposed method to the state-of-the-art techniques in the SER task indicates the superiority of the proposed method. Our findings underscore the pivotal role of deep unsupervised feature learning in elevating the landscape of SER, offering enhanced emotional comprehension in the realm of human-computer interactions.
لیست مقالات
لیست مقالات بایگانی شده
PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph
Romina Etezadi - Mehrnoush Shamsfard
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
Mahmood Kalantari - Mehdi Feghhi - Nasser Mozayani
LuckyAgent2022: A Stop-Learning Multi-Armed Bandit Automated Negotiating Agent
Arash Ebrahimnezhad - Faria Nassiri-Mofakham
Improving Deep Neural Network Accelerator for Malaria Diseased Blood Cells using FPGA
Hadi Rezaeikarjani - Mojtaba Valinataj
تخلیهبار محاسباتی ریزدانه تحرکآگاه در رایانش لبه برای اینترنت اشیاء
شکوفه نوروزی - دکتر زینب موحدی شکوفه نوروزی - زینب موحدی -
پیش بینی گره های رهبر در شبکه های اجتماعی با استفاده از پیش بینی پیوند
روح اله رشیدی - فرساد زمانی بروجنی - محمد رضا سلطان آقایی - هادی فرهادی
خوشه بندی مقید داده ها به کمک اتوماتای یادگیر سلولی
شکوفه علی محمدی - احمدعلی آبین
GanjNet: Leveraging Network Modeling with Large Language Models for Persian Word Sense Induction
Amir Mohammad Kouyeshpour - Hadi Veisi - Saman Haratizadeh
Classification and Evaluation of Privacy Preserving Data Mining Methods
Negar Nasiri - Mohammadreza Keyvanpour
Using Deconvolutional Variational Autoencoder for Answer Selection in Community Question Answering
Golshan Afzali Boroujeni - Heshaam Faili
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.2.4