Home Run Help Download About


Abstract

Moonlighting proteins (MPs) are a special type of protein with multiple independent functions. MPs play vital roles in cellular regulations, diseases, and biological pathways. At present, very few MPs have been discovered by biological experiments. Meanwhile, computational-based methods for MPs identification are very limited. Nowadays, there is no de-novo prediction method for MPs. Therefore, comprehensive research and identification of MPs are urgently required. In this paper, we propose a multimodal deep ensemble learning architecture, named MEL-MP, which is the first de-novo computation model fro predicting MPs. First, we extract four sequence-based features, including primary sequence information, evolutionary information, physical and chemical properties, and secondary structure information. Second, we construct specific classifiers for each kind of feature, respectively. Finally, we apply the stacked ensemble to integrate the output of each classifier. Through the comprehensively model selection and cross-validation experiments, enabling specific classifier for specific kind of feature can achieve the best performance. For validate the effectiveness fusion based stacked ensemble, the different feature fusion strategies include direct combination and multimodal deep auto-encoder are used to compare with MEL-MP. MEL-MP is also validated to have superior prediction performance with an F-score of 0.8911, surpasses existing machine learning model, MPFit (F-score is 0.784). In addition, for whole human proteins, MEL-MP is conducted to predict the potential MPs. We further explore predicted MPs from three different perspectives, including the distribution on human chromosomes, the association with diseases, and the evolution history. The results show that the predicted MPs are significantly related to diseases, the ratio of MPs in the Y chromosome is higher compared with other chromosomes, and MPs may have earlier originate than other non-MPs.