0% Complete
فارسی
Home
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
Authors :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
Keywords :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
Abstract :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
Papers List
List of archived papers
An Optimized GBDT-Based Model Using SMOTE for Effective Diagnosis of Coronary Heart Disease
Elahe Moradi - Mohammad Javadian
Integration of Electric Vehicles in Smart Grid using Deep Reinforcement Learning
Farkhondeh Kiaee
Recommendation Systems in Smart Agriculture: Pathway to a well-designed system
Ahmad Nameni - Amir Ghafarian Daneshmand - Omid Mahdi Ebadati E
Face Recognition Based on Local Statistical Features and Artificial Neural Network
Mehdi Moghimi - Dr Hadi Grailu
Explainable AI for Medical Image Diagnosis Using Hybrid Attention-CAM Mechanisms
Negin Amirzadeh
Effective Design of Reversible 2×2 Vedic Multiplier With Low Cost
Mojtaba Noorallahzadeh - Mohammad Mosleh - Ali Shahidikia
Violence detection using one-dimensional convolutional networks
Narges Honarjoo - Ali Abdari - Dr Azadeh Mansouri
چارچوب مسیریابی آگاه از اعتماد تطبیقی مبتنی بر گراف زمانی برای ایمنسازی پروتکل RPL در شبکههای اینترنت اشیاء پویا
زهره شعاعی - رسول اسماعیلی فرد - رضا جاویدان
Blockchain-based Secure UAV-assisted Battlefield Operation underlying 5G
Dhruvi Pancholi - Nilesh Kumar Jadav - Sudeep Tanwar - Deepak Garg - S. Mohammadali Zanjani
Sustainability analysis and improvement of model driven engineering and model transformation languages
Kevin Lano - Shekoufeh Kolahdouz Rahimi
more
Samin Hamayesh - Version 42.5.2