0% Complete
English
صفحه اصلی
/
پانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Benchmarking Embedding Models for Persian-Language Semantic Information Retrieval
نویسندگان :
Mahmood Kalantari
1
Mehdi Feghhi
2
Nasser Mozayani
3
1- دانشگاه علم و صنعت ایران
2- دانشگاه علم و صنعت ایران
3- دانشگاه علم و صنعت ایران
کلمات کلیدی :
Embedding search،Embedding models،Persian embedding،Persian question-answering،Retrieval-Augmented Generation (RAG)
چکیده :
The increasing reliance on semantic-based retrieval, especially in the context of large language model-powered chatbots, underscores the need for robust evaluation of embedding models. In this study, the performance of embedding models for Persian-language information retrieval was investigated, addressing an area with limited prior research. Four question-answering datasets were used—two publicly available datasets adapted for this study and two custom datasets derived from translations. A systematic evaluation of 17 embedding models was conducted, and the models were ranked based on their accuracy in retrieving relevant content using similarity measures such as dot product, cosine similarity, and L2 distance. The findings emphasize the adaptability of these models to diverse textual data and address the specific challenges posed by the Persian language. This research bridges a critical gap in Persian-language retrieval tasks, providing a comprehensive benchmark for evaluating embedding models in semantic information retrieval scenarios.
لیست مقالات
لیست مقالات بایگانی شده
Conceptual Intelligent Model for Visual Question Answering using Attention Mechanism and Relational Reasoning
ٍElham Alighardash - Dr Hassan Khotanlou - Vahid Pour Amin
Integration of Electric Vehicles in Smart Grid using Deep Reinforcement Learning
Farkhondeh Kiaee
Cryptanalysis of two password authenticated key exchange schemes
Mohammad Ali Poorafsahi - Hamid Mala
Multi-Modal Longitudinal Tooth Labeling with Temporal Graph–Transformer Integration
Maral Mirza mohammadi - Mahdi Tarom
Mode Selection and Resource Allocation in D2D-Enabled MC-NOMA using Matching Theory
Alireza Gholamrezaee - Hamid Farrokhi - Javad Zeraatkar Moghaddam
Using Deconvolutional Variational Autoencoder for Answer Selection in Community Question Answering
Golshan Afzali Boroujeni - Heshaam Faili
ML-based Optical Fibre Fault Detection in Smart Surveillance and Traffic Systems
Rushil Patel - Sana Narmawala - Nikunjkumar Mahida - Rajesh Gupta - Sudeep Tanwar - Hossein Shahinzadeh
An Optimized GBDT-Based Model Using SMOTE for Effective Diagnosis of Coronary Heart Disease
Elahe Moradi - Mohammad Javadian
3D Mesh ONoC: Design of low Insertion Loss and Non-blocking Optical Router and Efficient Routing Algorithm
Sanaz Asadinia - Elham Yaghoubi - Mostafa Sadeghi - Mahdi Mehrabi
A perceptual loss for screen content image super-resolution
Hossein Sekhavaty-Moghadam - Marzieh Hosseinkhani - Dr Azadeh Mansouri
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 42.5.2