0% Complete
فارسی
Home
/
شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
A Deep Learning Framework for Phase-Aware Feature Representation to Improve Sound Source Direction and Distance Estimation
Authors :
Zahra Abolfazli
1
Hamid Reza Abutalebi
2
1- Yazd University
2- Yazd University
Keywords :
(Sound Event Localization and Detection (SELD،phase spectrogram features،Conformer
Abstract :
This paper proposes a novel refinement of the network’s input features to improve distance and Direction-of-Arrival estimation in the sound event localization and detection system. Instead of relying on Mel energies, we propose using phase spectrograms as input feature, which effectively preserve inter-channel time delays and capture crucial wave propagation characteristics. Furthermore, we introduce architectural improvements for increased robustness. Specifically, Huber loss replaces MSE, reducing sensitivity to noise. Additionally, MHSA layers are replaced with Conformer blocks to better model both long-range dependencies and local interactions within the audio data. Our experimental results validate the effectiveness of the proposed phase-based feature representation and optimized architecture, demonstrating improvements in both DOA and distance estimation.
Papers List
List of archived papers
UltraLearn: Next-Generation CyberSecurity Learning Platform
Saeed Raisi - Saeid Ghasemshirazi - Ghazaleh Shirvani
PC-MCLD: Pose-Constrained and Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Hanieh Fazli - Reza Azmi
Classification of mental states of human concentration based on EEG signal
Mehran Safari Dehnavi - Vahid Safari Dehnavi - Dr Masoud Shafiee
Intent-Based Classification of Multi-Stage Cyber Attacks Using Attacker TTPs and Machine Learning
Fatemeh Imanimehr - Hamed Ebrahimi
Heart Sound Classification based on Group-based Sparse Features of PCG Signal
Zahra Hossein-Nejad - Mehdi Nasri
Load Balancing in Software-Defined Networks Using Multi-Level Thresholds and Hybrid Switch Migration Strategies
Alireza Karimi - Mohammad yousef Darmani
Attention-Enhanced Ensemble Learning for Automated Stenosis Detection in X-ray Coronary Angiography Videos
Marzieh Sadat Hosseini - Ahmad R. Naghsh-Nilchi - Mehran Safayani - Masoumeh Sadeghi
Improved Weighting in the Automated Texts Classification using Fuzzy Method
Hamidreza Sadrarhami - S. Mohammadali Zanjani - Ghazanfar Shahgholian
روش مهاجرت خوشهای برای بهبود بستربندی به مشتری در گردشکارهای بدون سرویسدهنده
محمدامین قسوری جهرمی - مهرداد آشتیانی - فاطمه بخشی
Multi-Modal Longitudinal Tooth Labeling with Temporal Graph–Transformer Integration
Maral Mirza mohammadi - Mahdi Tarom
more
Samin Hamayesh - Version 42.5.2