0% Complete
فارسی
Home
/
شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Enhancing Persian Speech Emotion Recognition with Contrastive Learning and Multimodal Fusion
Authors :
Mobina Esmaeili
1
Vajiheh Sabeti
2
1- دانشگاه الزهرا(س)
2- دانشگاه الزهرا(س)
Keywords :
Multimodal Emotion Recognitiont،Representation Learning،Representation Learning،Speech-Text Fusion،ShEMO Dataset
Abstract :
Emotion recognition from both speech and text in low-resource languages such as Persian presents significant challenges due to linguistic complexity and the scarcity of labeled datasets. Conventional multimodal fusion methods often struggle to capture nuanced cross-modal interactions and typically neglect inter-class emotional relationships. To address these limitations, this paper introduces a novel contrastive learning framework that employs pre-trained projection networks to enhance multimodal representations through a combination of intra-modal, inter-modal, and semi-contrastive objectives. The refined embeddings are integrated via a lightweight fusion layer for final emotion classification. In addition, an automatic speech recognition (ASR) system is incorporated to enrich textual inputs and improve linguistic diversity. Experiments on the ShEMO corpus demonstrate that the proposed approach achieves an accuracy of 83.04% and an unweighted average recall (UAR) of 88.1%, substantially outperforming traditional fusion-based baselines. These results confirm the effectiveness of the framework in improving cross-modal alignment and representation quality, highlighting its potential for intelligent interactive systems, social media sentiment analysis, and automated affective computing applications.
Papers List
List of archived papers
Enhancing Software Effort Estimation with an Integrated Approach of Particle Swarm Optimization and Genetic Algorithms in Analogy-based Method
Ehsan Nasr - Keyvan Mohebbi
تشخیص بیماری شبکوری با استفاده از ترکیب الگوریتمهای یادگیری عمیق
میثم فتاحی
A Deep Learning Framework for Phase-Aware Feature Representation to Improve Sound Source Direction and Distance Estimation
Zahra Abolfazli - Hamid Reza Abutalebi
Cryptanalysis of two password authenticated key exchange schemes
Mohammad Ali Poorafsahi - Hamid Mala
Load Balancing in Software-Defined Networks Using Multi-Level Thresholds and Hybrid Switch Migration Strategies
Alireza Karimi - Mohammad yousef Darmani
Real-Time EEG-Based Analysis Of Stress-Inducing Stimuli
Mohsen Mahmoudi - Fattaneh Taghiyareh - Yasamin Akhavein - Elnaz Ghorbani
استخراج ویژگی مجموعه دادههای پزشکی دارای ابعاد بالا با استفاده از برنامه نویسی ژنتیک چند منظوره
سحر فقیهی راد - دکتر سیده نفیسه آل محمد سحر فقیهی راد - سیده نفیسه آل محمد -
ارائه راهکاری جهت مقابله با حملات DoS در شبکه های نرم افزارمحور
ویدا هاشمی - احمد بختیاری شهری - رضا جاویدان
Identifying Children's Personality Styles through Drawing Analysis using Machine Learning
Maedeh Mosharraf - Faezeh Banabazi
A Survey on Utilizing Reinforcement Learning in Wireless Sensor Networks Routing Protocols
Ali Forghani Elah Abadi - Seyedeh Elham Asghari - Sepideh Sharifani - Seyyed Amir Asghari - Mohammadreza Binesh Marvasti
more
Samin Hamayesh - Version 43.8.0