0% Complete
فارسی
Home
/
شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Enhancing Persian Speech Emotion Recognition with Contrastive Learning and Multimodal Fusion
Authors :
Mobina Esmaeili
1
Vajiheh Sabeti
2
1- دانشگاه الزهرا(س)
2- دانشگاه الزهرا(س)
Keywords :
Multimodal Emotion Recognitiont،Representation Learning،Representation Learning،Speech-Text Fusion،ShEMO Dataset
Abstract :
Emotion recognition from both speech and text in low-resource languages such as Persian presents significant challenges due to linguistic complexity and the scarcity of labeled datasets. Conventional multimodal fusion methods often struggle to capture nuanced cross-modal interactions and typically neglect inter-class emotional relationships. To address these limitations, this paper introduces a novel contrastive learning framework that employs pre-trained projection networks to enhance multimodal representations through a combination of intra-modal, inter-modal, and semi-contrastive objectives. The refined embeddings are integrated via a lightweight fusion layer for final emotion classification. In addition, an automatic speech recognition (ASR) system is incorporated to enrich textual inputs and improve linguistic diversity. Experiments on the ShEMO corpus demonstrate that the proposed approach achieves an accuracy of 83.04% and an unweighted average recall (UAR) of 88.1%, substantially outperforming traditional fusion-based baselines. These results confirm the effectiveness of the framework in improving cross-modal alignment and representation quality, highlighting its potential for intelligent interactive systems, social media sentiment analysis, and automated affective computing applications.
Papers List
List of archived papers
Data Analysis to Reduce Electrical Power Plants
Amirali Sahraei - Jamshid Shanbehzadeh
Design of low-latency Floating-Point units for Softmax Computation in Transformer-based Large Language Models
Hoda Ghabeli - Amir Sabbagh Molahosseini
Establishing security using cryptography and biometric authentication to counter cyber-attacks
Mohammed ADIL AKABR - Mehdi Hamidkhani - Mostafa Sadeghi
روشی چندوجهی برای تحلیل احساسات در زبان فارسی با استفاده نشریه ساختار بلاغی و ترنسفرمرها
ریحانه احمدی علیائی - امینه امینی - عباس جلیلوند
Impact of ICT and Digital Evolution on Capital Structure in Companies
Ali Noori
A Novel Service Deployment Policy in Fog Computing Considering The Degree of Availability and Fog Landscape Utilization Using Multiobjective Evolutionary Algorithms
Maryam Eslami - Dr Mehdi Sakhaei-nia
Heart Sound Classification based on Group-based Sparse Features of PCG Signal
Zahra Hossein-Nejad - Mehdi Nasri
DRL-Based Phase Optimization for O-RIS in Dual-Hop Hard Switching FSO/RIS-aided RF and UWOC Systems
Aboozar Heydaribeni - Hamzeh Beyranvand - Sahar Eslami
هوشمندسازی پایش کیفیت رنگزنی داخلی گرین تایر و تحلیل داده برای بهینه سازی عمر بلادر، مصرف رنگ و ریشه یابی عیوب پخت
سامان ثنایی - رضا رحیمی
UltraLearn: Next-Generation CyberSecurity Learning Platform
Saeed Raisi - Saeid Ghasemshirazi - Ghazaleh Shirvani
more
Samin Hamayesh - Version 43.8.0