SimCat: Similarity-Based Category-Aware Answer Selection for Persian Question Answering

Document Type : Research Article

Authors

1 Department of Software Engineering, University of Isfahan, Isfahan, Iran

2 Department of Software Engineering, University of Isfahan, Iran

10.22108/jcs.2024.142244.1146

Abstract

Answer Selection is one of the main tasks of Question Answering (QA) systems, which aims to find the most relevant sentence among a set of sentences, according to the question; It aims to rank the candidate answers based on their relevance and similarity with the question and find the final answer. All of the research done in this field to date has primarily focused on the English language, with no research on open-domain Answer Selection in Persian; One of the main reasons being lack of Persian open-domain Answer Selection datasets. In this paper, we introduce a Similarity-Based Category-Aware method for Answer Selection, analyse the effectiveness of measuring sentence similarity from different aspects (lexical, syntactical, and semantic) rather than one, and evaluate this method on three different benchmark English datasets and four new datasets which we have created for factoid open-domain Answer Selection in Persian. In addition to improving the accuracy of Answer Selection, we reduced the required time for the process by removing unrelated candidate sentences based on both the question and candidate answer category. Following the implementation and evaluation of this approach for both English and Persian languages, it was discovered that the proposed approach improved Answer Selection in terms of MAP and MRR by an average of 5.1% and 1.8% for English and 5.3% and 3.2% for Persian, respectively. In addition, it reduced the required time by an average of 79% for English and 69% for Persian.

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 10 December 2024
  • Receive Date: 24 July 2024
  • Revise Date: 31 October 2024
  • Accept Date: 10 December 2024