Authors

1 Department of Computer Engineering, Faculty of Engineering, Bu-Ali Sina University, Hamedan, Iran.

2 Department of Computer Architecture, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran.

Abstract

Author identification is an attempt to demonstrate the characteristics of the author of a piece of language information so that in the end, it would be possible to significantly distinguish the difference between various texts written by different people. The rapid development of Internet communication has caused Internet tools with anonymous identity, such as emails and weblogs, to become popular communication methods for the perpetrators of illegal acts and has raised some security concerns. Persian language is of interest to a great number of different individuals and organizations for various reasons such as political, social, artistic, cultural and religious issues. In this paper, a number of intelligent writeprint methods which help automatic identification of a Persian writer based on his/her writing style are studied and compared. For this purpose, after collecting two different databases, five feature types including lexical, syntactic, semantic and application-specific features, were used for extracting stylometric characteristics. In this study KNN, Delta, Neural Networks, Decision Tree and Linear Discriminate Analysis classification methods were applied to these databases. The results and their comparison showed that Linear Discriminate Analysis and KNN methods ranked first and second, respectively, in terms of accuracy among the studied methods.

Keywords