Comparison of Hybrid and Filter Feature Selection Methods to Identify Candidate Single Nucleotide Polymorphisms



During the last decade, applying feature selection methods in bioinformatics has become an essential necessity for model building. This is due to the high dimensional nature of many modeling tasks in bioinformatics of them being Single Nucleotide Polymorphisms (SNPs) selection. In this paper, we propose three hybrid feature selection methods named CNNFS, Ck-NNFS, and CRRFS, which are combinations of filter and wrapper techniques. In our methods, filter techniques were applied to remove the irrelevant/redundant features as the first step. Then in the second step, wrapper techniques were exploited to refine the primary feature subset obtained from the first step. Neural Network, k-Nearest Neighbor, and Ridge Regression were injected in the wrapper phase as induction algorithms. Since pure wrapper methods take a long time to run on high dimensional data, we compared our methods with three well-known filter methods, and skipped the wrappers. The results vividly show the performance of hybrid methods in addition to their dimensionality reduction ability in SNPs selection. The CRRFS algorithm brought the most satisfactory results regarding to the precision of recognizing candidate SNPs, and the recall of them in the final SNPs subset.