0% Complete
فارسی
Home
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
Authors :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
Keywords :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
Abstract :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
Papers List
List of archived papers
روشی برای بهبود آزمون جهش پیشگویانه با در نظر گرفتن اثر داده های از دست رفته
طه رستمی - دکتر سعید جلیلی طه رستمی - سعید جلیلی -
Recommendation Systems in Smart Agriculture: Pathway to a well-designed system
Ahmad Nameni - Amir Ghafarian Daneshmand - Omid Mahdi Ebadati E
Improving Deep Neural Network Accelerator for Malaria Diseased Blood Cells using FPGA
Hadi Rezaeikarjani - Mojtaba Valinataj
Predictive Maintenance using LSTM and Adaptive Windowing
Aien Ghanbari Adivi - Behrouz Shahgholi Ghahfarokhi
Aspect-Based Sentiment Analysis of After-Sales Service Quality: A Case Study of Snowa and Competitors Using Digikala Reviews
Safiyeh Samadanian - Marjan Kaedi
Improving Drug-Target Interaction Prediction Using Enhanced Feature Selection
Maryam Taheri - Mohammad Reza Keyvanpour - Mohadeseh Saadat Mousavi
Enhancing Software Effort Estimation with an Integrated Approach of Particle Swarm Optimization and Genetic Algorithms in Analogy-based Method
Ehsan Nasr - Keyvan Mohebbi
Improved Weighting in the Automated Texts Classification using Fuzzy Method
Hamidreza Sadrarhami - S. Mohammadali Zanjani - Ghazanfar Shahgholian
Towards Provable Privacy Protection in IoT-Health Applications
Samane Sobuti - دکتر سیاوش خرسندی
تخلیهبار محاسباتی ریزدانه تحرکآگاه در رایانش لبه برای اینترنت اشیاء
شکوفه نوروزی - دکتر زینب موحدی شکوفه نوروزی - زینب موحدی -
more
Samin Hamayesh - Version 41.3.1