0% Complete
English
صفحه اصلی
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
نویسندگان :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
کلمات کلیدی :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
چکیده :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
لیست مقالات
لیست مقالات بایگانی شده
Predicting Concentration of Particulate Matter (PM2.5) in Hamedan using Machine Learning Algorithms
Anita Karim Ghassabpour - Hatam Abdoli - Muharram Mansoorizadeh - Saeid Seyedi
Human Resource Allocation to the Credit Requirement Process, A Process Mining Approach
Omid Mahdi Ebadati - Mohammad Mehrabioun - Shokoofeh Sadat Hosseini
Improving Fog Computing Scalability in Software Defined Network using Critical Requests Prediction in IoT
Hajar Ghanbari
A Blockchain Architecture for Secure, High-Speed P2P Energy Trades with Game-Theoretic Coalition Formation
Amin Aboutalebi Najafabadi - Seyed Hossein Hosseinian
A Multi Objective & Trust-Based Workflow Scheduling Method In Cloud Computing Based On The MVO Algorithm
Fatemeh Ebadifard
Classical-Quantum Multiple Access Wiretap Channel with Common Message: One-shot Rate Region
Hadi Aghaee - Dr Bahareh Akhbari
To Kill a Mockingbird: Cryptanalysis of an Authenticated Key Exchange Scheme for Drones
Neda Toghraee - Hamid Mala
بکارگیری الگوریتم بهینه سازی فاخته و منطق فازی به منظور بهبود زمانبندی وظایف در محیط محاسبات مه
فاطمه دوامی - حمید جلیلوند - فاطمه نجفی
AI-based Secure Intrusion Detection Framework for Digital Twin-enabled Critical Infrastructure
Tanisha Patel - Nilesh Kumar Jadav - Tejal Rathod - Sudeep Tanwar - Deepak Garg - Hossein Shahinzadeh
IoT-Based Model in Smart Urban Traffic Control: Graph theory and Genetic Algorithm
Saeed Doostali - Seyed Morteza Babamir - Mohammad Shiralizadeh Dezfoli - Behzad Soleimani Neysiani
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 43.8.0