0% Complete
English
صفحه اصلی
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
نویسندگان :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
کلمات کلیدی :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
چکیده :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
لیست مقالات
لیست مقالات بایگانی شده
ML-based Optical Fibre Fault Detection in Smart Surveillance and Traffic Systems
Rushil Patel - Sana Narmawala - Nikunjkumar Mahida - Rajesh Gupta - Sudeep Tanwar - Hossein Shahinzadeh
UltraLearn: Next-Generation CyberSecurity Learning Platform
Saeed Raisi - Saeid Ghasemshirazi - Ghazaleh Shirvani
A Graph Attention-Based Autoencoder for Critical Path Anomaly Detection in Microservices
Mahdi Naderi - Hossein Momeni - Shayan Shahini
Statistical distance-base acceptance strategy for desirable offers in bilateral automated negotiation
Arash Ebrahimnezhad - Dr Hamid Jazayeriy - Dr Faria Nassiri-mofakham
کشف لبه در تصاویر پزشکی با استفاده از اتوماتای سلولی سلسله مراتبی
مریم علینقی زاده - علیرضا رضوانیان
دستهبندی متون خبری فارسی با یادگیری فعال
مینا طباطبائی - دکتر سعیده ممتازی
HTCAR: Hierarchical Text Classification based on aggregation of Representations
Ali Bavand - Mohammad Mehdi Homayounpour - Ahmad Nickabadi
Intelligent Transportation System (ITS) Using Internet of Things (IoT)
Engineer Reza Khalilian - Dr. Abdalhossein Rezai - Dr. Sayyed Mohammad Reza Talakesh
شناسایی وبگاه های دامچینی به کمک شبکه عصبی گسستهساز بردار یادگیر (LVQ)
یگانه ستاری - غلامعلی منتظر
Particle Swarm Optimization-Based Framework for 3D Swarm Robotic Navigation Using Artificial Potential Field Dynamics
Samim Kamyab - Masoud Shirzadeh - Ghoncheh Zand
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 43.8.0