0% Complete
فارسی
Home
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
Authors :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
Keywords :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
Abstract :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
Papers List
List of archived papers
یادگیری فناورانه و بینالمللیسازی سکوهای پیامرسان: چارچوبی برای بازیگران متأخر
علیرضا کبیری فرد - علی ولی زاده - مهدی مجیدپور
Recommendation Systems in Smart Agriculture: Pathway to a well-designed system
Ahmad Nameni - Amir Ghafarian Daneshmand - Omid Mahdi Ebadati E
Explainable AI for Medical Image Diagnosis Using Hybrid Attention-CAM Mechanisms
Negin Amirzadeh
فراتر از ارزیابی: استفاده استراتژیک از نظریه بازی برای بازتعریف سازوکارهای همتاسنجی
سیده فاطمه نورانی - سحر مقراضی
An Efficient Link Prediction Method using Community Structures
Dr Hadi Shakibian - Setareh Mokhtari
Improving Privacy Protection in a Collaborative Blockchain-based E-Health Records System
Arman Emam-Hoseini - Samane Sobuti - دکتر سیاوش خرسندی - Alireza Hashemi-Golpayeghani
An Adaptive Mutation-Enhanced EHO-SVM Framework for Intrusion Detection in IoMT Environments
Amirhossein Damia - Erfaneh Khanmohammadi
بهبود معاملات الگوریتمی سهام مبتنی بر رویکرد یادگیری تقویتی
مها العطوان - جعفر پورامینی
Automatic identification and reconstruction of Tuberculosis in microscopic images using convolutional auto-encoder network
Ahmad Reza Nadafi - Farahnaz Mohanna
A Novel Decentralized Privacy Preserving Federated Learning Model for Healthcare Applications
Saba Ameri - Reza Ebrahimi Atani
more
Samin Hamayesh - Version 42.5.2