Combination of Genetic Programming and Support Vector Machine-Based Prediction of Protein-Peptide Binding Sites With Sequence and Structure-Based Features

Document Type : Research Article

Authors

Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.

Abstract

Prediction of the peptide-binding site of proteins is a significant and essential task in different processes such as understanding biological processes, protein functional analysis, comparison of functional sites, comprehension of the transactions mechanism, drug design, cellular signaling, and cancer treatment. Predictive analysis of the protein-peptide binding site is one of the most challenging bioinformatics issues. Experimental methods are time-consuming, costly, and laborious. Therefore, we propose a machine learning-based method for predicting protein-peptide binding sites by utilizing enhanced features vector obtained from three-dimensional protein structure and one-dimensional sequence string data. To this end, the genetic programming technique is applied to the obtained basic features extract a more discriminative feature vector. Then support vector machine is employed to determine the binding residue of each amino acid. Finally, the binding sites are predicted using the structure clustering algorithm on the obtained binding residues. The proposed method was evaluated on the Bio Lip dataset. The prediction rate of 92.76% and 93.09% were achieved when 10-fold cross-validation and independent test set respectively used. The acquired results were compared to the performance of other state-of-the-art methods. The proposed method achieves robust and consistent performance using sequence-based and structure-based features for both 10-fold cross-validation and independent tests.

Keywords


[1] O. Silakari and P. K. Singh. Hotspot and binding site prediction: Strategy to target protein–protein interactions. Concepts and Experimental Protocols of Modelling and Informatics in Drug Design, pages 267--284, 2021. [ bib | DOI ]
[2] Y. Qiu, X. Li, X. He, J. Pu, J. Zhang, and S. Lu. Computational methods-guided design of modulators targeting protein-protein interactions (PPIs). European Journal of Medicinal Chemistry, 207:112764, 2020. [ bib | DOI ]
[3] G. Taherzadeh, Y. Zhou, A. W. Liew, and Y. Yang. Sequence-Based Prediction of Protein–Carbohydrate Binding Sites Using Support Vector Machines. Journal of chemical information and modeling, 10:2115--2122, 2016. [ bib | DOI ]
[4] S. Shafiee, A. Fathi, and F. Abdali-Mohammadi. A Review of the Uses of Artificial Intelligence in Protein Research. In Fourth National Conference on Proteins and Peptide science, 2019. [ bib | DOI ]
[5] S. Gattani, A. Mishra, and T. Hoque. Sequence and Structure based Protein Peptide Binding Residue Prediction. In The 6th Annual Conference on Computational Biology and Bioinformatics, Louisiana, USA., 2018. [ bib | DOI ]
[6] N. Verma, X. Qu, F. Trozzi, M. Elsaied, N. Karki, Y. Tao, B. Zoltowski, E. C. Larson, and E. Kraka. SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction. International journal of molecular sciences, 22(3), 2021. [ bib | DOI ]
[7] J. Qiu, M. Bernhofer, M. Heinzinger, S. Kemper, T. Norambuena, F. Melo, and B. Rost. ProNA2020 predicts protein–DNA, protein–RNA, and protein–protein binding proteins and residues from sequence. Journal of Molecular Biology, 437(7):2428--2443, 2020. [ bib | DOI ]
[8] I. Johansson-Åkhe, C. Mirabello, and B. Wallner. Predicting protein-peptide interaction sites using distant protein complexes as structural templates. Scientific reports, 9(1):1--13, 2019. [ bib | DOI ]
[9] H. Guo, B. Liu, D. Cai, and T. Lu. Predicting protein–protein interaction sites using modified support vector machine. International Journal of Machine Learning and Cybernetics, 9(3):393–398, 2018. [ bib | DOI ]
[10] G. Taherzadeh, Y. Zhou, A. W. Liew, and Y. Yang. Structure-based prediction of protein– peptide binding regions using Random Forest. Bioinformatics, 34(3):477–484, 2018. [ bib | DOI ]
[11] L. G. Trabuco, S. Lise, E. Petsalaki, and R. B. Russell. PepSite: prediction of peptide-binding sites from protein surfaces. Nucleic acids research, 40(W1):W423–W427, 2012. [ bib | DOI ]
[12] C. Yan and X. Zou. Predicting peptide binding sites on protein surfaces by clustering chemical interactions. Journal of computational chemistry, 36(1):49--61, 2014. [ bib | DOI ]
[13] A. Lavi, C. H. Ngan, D. Movshovitz-Attias, T. Bohnuud, C. Yueh, D. Beglov, O. Schueler-Furman, and D. Kozakov. Detection of peptide-binding sites on protein surfaces: The first step toward the modeling and targeting of peptide-mediated interactions. Proteins: Structure, Function, and Bioinformatics, 81(12):2096--2105, 2013. [ bib | DOI ]
[14] J. Zhao, Y. Cao, and L. Zhang. Exploring the computational methods for protein-ligand binding site prediction. Computational and structural biotechnology journal, 18:417--426, 2020. [ bib | DOI ]
[15] P. Chen, S. Hu, J. Zhang, X. Gao, J. Li, J. Xia, and B. Wang. A Sequence-Based Dynamic Ensemble Learning System for Protein Ligand-Binding Site Prediction. IEEE/ACM transactions on computational biology and bioinformatics, 13(5):901 -- 912, 2016. [ bib | DOI ]
[16] J. Zhao, Y. Cao, and L. Zhang. Exploring the computational methods for protein-ligand binding site prediction. Computational and structural biotechnology journal, 18:417--426, 2020. [ bib | DOI ]
[17] R. Krivák and D. Hoksza. P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. Journal of Cheminformatics, 10(1):1--12, 2018. [ bib | DOI ]
[18] L. Jendele, R. Krivak, P. Skoda, M. Novotny, and D. Hoksza. PrankWeb: a web server for ligand binding site prediction and visualization. Nucleic Acids Research, 47(w1):W345–W349, 2019. [ bib | DOI ]
[19] J. Si, J. Cui, J. Cheng, and R. Wu. Computational Prediction of RNA-Binding Proteins and Binding Sites. International journal of molecular sciences, 16(11):26303--26317, 2015. [ bib | DOI ]
[20] F. Guo, S. C Li, Y. Fan, and L. Wang. Identifying Protein-Protein Binding Sites with a Combined Energy Function. Current Protein and Peptide Science, 15(6):540--552, 2014. [ bib | DOI ]
[21] S. Gattani, A. Mishra, and M. T. Hoque. StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence. Carbohydrate research, 486:107857, 2019. [ bib | DOI ]
[22] H. Zhao, G. Taherzadeh, Y. Zhou, and Y. Yang. Computational Prediction of Carbohydrate-Binding Proteins and Binding Sites. Current protocols in protein science, 94(1), 2018. [ bib | DOI ]
[23] X. Zhang and S. Liu. Computational Prediction of Carbohydrate-Binding Proteins and Binding Sites. Bioinformatics, 33(6):854–862, 2017. [ bib | DOI ]
[24] S. Sukumar, X. Zhu, S. S. Ericksen, and J. C. Mitchell. DBSI server: DNA binding site identifier. Bioinformatics, 32(18):2853–2855, 2016. [ bib | DOI ]
[25] G. Taherzadeh, Y. Zhou, A. W. Liew, and Y. Yang. Sequence-Based Prediction of Protein–Carbohydrate Binding Sites Using Support Vector Machines. Journal of chemical information and modeling, 56(10):2115–2122, 2016. [ bib | DOI ]
[26] Z. Jiang, X. Hu, G. Geriletu, H. Xing, and X. Cao. Identification of Ca2+-binding residues of a protein from its primary sequence. Genetics and molecular research, 15(2), 2016. [ bib | DOI ]
[27] S. Shafiee, A. Fathi, and F. A. Mohammadi. Prediction of protein – peptide binding residues using classification algorithms. In 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), pages 29--34. IEEE, 2020. [ bib | DOI ]
[28] C. Xia, X. Pan, and H. Shen. Protein–ligand binding residue prediction enhancement through hybrid deep heterogeneous learning of sequence and structure data. Bioinformatics, 36(10):3018–3027, 2020. [ bib | DOI ]
[29] J. Segura, P. F. Jones, and N. Fernandez-Fuentes. Improving the prediction of protein binding sites by combining heterogeneous data and Voronoi diagrams. BMC bioinformatics, 12(1):1--9, 2011. [ bib | DOI ]
[30] F. Guo, S. Li Cheng, Z. Wei, D. Zhu, C. Shen, and L. Wang. Structural neighboring property for identifying protein-protein binding sites. BMC systems biology, 9(5):1--9, 2015. [ bib | DOI ]
[31] F. Guo and L. Wang. Computing the protein binding sites. In International Symposium on Bioinformatics Research and Applications, pages 25--36. Springer, 2011. [ bib | DOI ]
[32] J. Jiménez, S. Doerr, G. Martínez-Rosell, A. S. Rose, and G. De Fabritiis. DeepSite: protein-binding site predictor using 3D-convolutional neural networks. Bioinformatics, 33(19):3036–3042, 2017. [ bib | DOI ]
[33] S. Reille, M. Garnier, X. Robert, P. Gouet, J. Martin, and G. Launay. Identification and visualization of protein binding regions with the ArDock server. Nucleic acids research, 46(W1):W417–W422, 2018. [ bib | DOI ]
[34] M. Simonovsky and J. Meyers. DeeplyTough: Learning Structural Comparison of Protein Binding Sites. Journal of chemical information and modeling, 60(4):2356–2366, 2020. [ bib | DOI ]
[35] Y. Cui, Q. Dong, D. Hong, and X. Wang. Predicting protein-ligand binding residues with deep convolutional neural networks. BMC bioinformatics, 20(1):1--12, 2019. [ bib | DOI ]
[36] R. Ramani, K. Krumholz, Y. Huang, and A. Siepel. PhastWeb: a web interface for evolutionary conservation scoring of multiple sequence alignments using phastCons and phyloP. BMC bioinformatics, 35(13):2320--2322, 2019. [ bib | DOI ]
[37] C. Clemente, C. Leonetti, S. Ravetti, D. Ferreiro, R. Parra, and M. Freiberger. FrustraPocket: A method to predict protein–ligand binding sites based on frustration. In 6th International Electronic Conference on Medicinal Chemistry. MDPI, 2020. [ bib | DOI ]
[38] Y. Cui, Q. Dong, D. Hong, and X. Wang. Predicting protein-ligand binding residues with deep convolutional neural networks. BMC bioinformatics, 20(1):1--12, 2019. [ bib | DOI ]
[39] H. Abid, N. J. Jenny, and S. Shovan. Improved Identification Performance of Lysine Glycation PTM using PSI-BLAST. In 2020 IEEE Region 10 Symposium (TENSYMP), pages 18--21. IEEE, 2020. [ bib | DOI ]
[40] Y. Yang, R. Heffernan, K. Paliwal, J. Lyons, A. Dehzangi, A. Sharma, J. Wang, A. Sattar, and Y. Zhou. Improved Identification Performance of Lysine Glycation PTM using PSI-BLAST. In Methods in Molecular Biology, pages 55--63. Springer, 2017. [ bib | DOI ]
[41] A. Sharma, A. Lysenko, Y. López, A. Dehzangi, R. Sharma, H. Reddy, A. Sattar, and T. Tsunoda. HseSUMO: Sumoylation site prediction using half-sphere exposures of amino acids residues. BMC genomics, 19(9):1--7, 2019. [ bib | DOI ]
[42] S. Ioffe and C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In International conference on machine learning, pages 448--456. PMLR, 2015. [ bib | DOI ]
[43] R. Sadeghi and F. A. Mohammadi. HseSUMO: Sumoylation site prediction using half-sphere exposures of amino acids residuesA Combined Feature-Learning Method Based on Simulated Annealing Algorithm and Genetic Programming (Case Study: Malignant Breast Cancer Diagnosis). Tabriz journal of electrical engineering, 48(1):127--136, 2018. [ bib | DOI ]
[44] A. H. Gandomi and E. Atefi. Software review: the GPTIPS platform. Genetic Programming and Evolvable Machines, 21(1):273--280, 2020. [ bib | DOI ]
[45] C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST), 2(3):1--27, 2011. [ bib | DOI ]
[46] G. Sharma, A. Panwar, I. Nasiruddin, and R. C. Bansal. Non-linear LS-SVM with RBF-kernel-based approach for AGC of multi-area energy systems. IET Generation, Transmission & Distribution, 12(14):3510--3517, 2018. [ bib | DOI ]
[47] C. Wang. Optimization of SVM method with RBF kernel. In Applied Mechanics and Materials, pages 2306--2310. Trans Tech Publ, 2014. [ bib | DOI ]
[48] H. Hotait, X. Chiementin, M. S. Mouchaweh, and L. Rasolofondraibe. Monitoring of Ball Bearing Based on Improved Real-Time OPTICS Clustering. Journal of Signal Processing Systems, 93(2):221–237, 2021. [ bib | DOI ]
[49] E. Petsalaki, A. Stark, E. García-Urdiales, and R. B. Russell. Accurate Prediction of Peptide Binding Sites on Protein Surfaces. PLoS computational biology, 5(3), 2009. [ bib | DOI ]
[50] M. M. Stepniewska-Dziubinska, P. Zielenkiewicz, and P. Siedlecki. Improving detection of protein-ligand binding sites with 3D segmentation. Scientific reports, 10(1):1--9, 2020. [ bib | DOI ]
[51] J. Yang, A. Roy, and Y. Zhang. BioLiP: a semi-manually curated database for biologically relevant ligand--protein interactions. Nucleic acids research, 41(D1):D1096–D1103, 2013. [ bib | DOI ]
[52] A. Mandloi. A Comparative Study of Pointwise Convergence and Uniform Convergence. International Journal of Mathematics Trends and Technology (IJMTT), 67(2):83--84, 2021. [ bib | DOI ]
[53] K. Lakshmanan. On convergence to the global optima. ArXiv, 2021. [ bib | DOI ]