Persian Texts Part of Speech Tagging Using Artificial Neural Networks

Authors

1 Science and research branch Islamic Azad university, Tehran

2 University of Isfahan

3 Khajeh Nasir Toosi University of Technology

Abstract

Part of speech tagging (POS) is a basic task in natural language processing applications such as morphological parsing, information retrieval, machine translation and question answering. POS Tagging is the task of giving a word its part of speech (e.g. noun or verb). It is followed by a lot of challenging steps, in particular, disambiguation, named entity recognition and compound verb detection. Most of tagging approaches for Persian language are focused on the hidden Markov models (HMMs) and rule based models. Since Persian is a free word order language, those models cannot cope with all the complexity of this language for POS tagging, named entity, word sense disambiguation and other related tasks. In this paper, artificial neural networks (ANNs) are used for POS tagging due to their ability to learn complex patterns. In the first study ANN is fed with raw data and in the second phase, data are clustered and multiple ANNs are trained separately for each cluster. The accuracy rates of 95.7% and 96.17% were received respectively. Comparing the results with the other approaches makes it clear that neural networks can do POS tagging and named entity recognition more precise than other methods.

Keywords