0% Complete
English
صفحه اصلی
/
شانزدهمین کنفرانس بین المللی فناوری اطلاعات و دانش
Design of low-latency Floating-Point units for Softmax Computation in Transformer-based Large Language Models
نویسندگان :
Hoda Ghabeli
1
Amir Sabbagh Molahosseini
2
1- دانشکاه آزاد کرمان
2- دانشکاه آزاد کرمان
کلمات کلیدی :
LLM،transformer،softmax،speculative،floating-point
چکیده :
Large Language Models (LLMs) have emerged as one of the most desirable and widely used interactive digital tools in the world in the last decade. Softmax is one of the key steps in LLMs where the output is a vector of probabilities for each token in the model dictionary. The softmax computations are time-consuming due to the large vocabulary size, which can significantly increase the exponential computations and normalization, impacting the overall speed of the model. Given the importance of accuracy and speed, some of the main operations and computations of softmax are performed on the floating-point units. Arithmetic speculative computations are considered when the result of the computations can be estimated from a path shorter than the critical path, with improved speedup. In this paper, speculative 32-bit floating-point computation is proposed by merging two formats, 32-bit and 16-bit, for softmax computations. Both the floating-point adder and the floating-point multiplier use this strategy. The proposed design, based on the input data of the softmax function, speculates that the 32-bit floating-point computations can be obtained by concatenating the result of 16-bit format and a part of the 32-bit format result, that gives correct results most of the time with less delay. If speculation is unsuccessful, the longer path from through the conventional 32-bit floating-point unit is activated at the cost of a slightly longer critical path. Experimental results show that speculative floating-point units lead to a reduction in delay with only marginal overhead in area and power consumption.
لیست مقالات
لیست مقالات بایگانی شده
Distributed coordination protocol for event data exchange in IoT monitoring applications
Behnam Khazael - Hadi Tabatabaee Malazi
سیستم توصیه گر برای خرید لوازم آرایشی و بهداشتی مبتنی بر الگوریتم جنگل تصادفی
فاطمه رمضانی خوزستانی - مجید رفیعی
Sparse Beamforming Design for Non-Coherent UD-CRAN with mm-Wave Fronthaul Links
Alireza M. Hosseini - Dr Abbas Mohammadi
حفظ حریم خصوصی در انتشار نسخه های متوالی دادههای شبکه اجتماعی با امکان افزایش یال
طاهره سرزهی - دکتر مهری رجایی طاهره سرزهی - مهری رجایی -
ML-based Optical Fibre Fault Detection in Smart Surveillance and Traffic Systems
Rushil Patel - Sana Narmawala - Nikunjkumar Mahida - Rajesh Gupta - Sudeep Tanwar - Hossein Shahinzadeh
پیشبینی حجم ترافیک شهری با استفاده از دادههای سرویس نشان مورد مطالعاتی: خیابان کمال اصفهان
مهسا لطیفی - جمشید مالکی
Designation and development of Camera Sensor for identification of polymer type
Negin Piri - Ahmad Salehi - Erfan Memarian
Classical-Quantum Multiple Access Wiretap Channel with Common Message: One-shot Rate Region
Hadi Aghaee - Dr Bahareh Akhbari
DynamicEvoStream : خوشه بندی پویای جریان داده تکاملی در زمانهای بیکاری
زهرا عمیقی - مرتضی یوسف صنعتی - میرحسین دزفولیان
PersianRAG A Retrieval Augmented Generation System for Persian Language
Hossein Hosseini - Mohammad Sobhan Zare - Amir Hossein Mohammadi - Arefeh Kazemi - Zahra Zojaji - Mohammad Ali Nematbakhsh
بیشتر
ثمین همایش، سامانه مدیریت کنفرانس ها و جشنواره ها - نگارش 43.8.0