Subcellular Localization of Multi-class Proteins Using Label Power-set Encoding
Abstract
As knowledge is essential in all research or even research initiative. Therefore, biologists always try to know where a protein resides in a cell . They can elucidate the functions of the protein with this revelation. Armed with marvelous accomplishment of upcoming and ongoing large-scale genome sequencing projects, an exponentially growing number of new protein sequences have been discovered. Rather than using expensive lab experiments, computational methods are far more effective to automatically and accurately identify the subcellular locations of these proteins. This book proposes an efficient multi-label predictor method, namely label powerset encoding, for predicting the subcellular localization of multilocation proteins. Briefly, on two recently published gram negative bacteria and plant datasets. Bacterial proteins play an important role in cell biology due to their importance in drug design and antibiotics research. The localization of bacterial proteins are very important since the function of a protein is closely linked with its location. A single gram negative bacteria proteins can be located in multiple locations in a protein. Prediction of subcellular locations of gram negative bacteria proteins is thus far more challenging and difficult. In this book, we proposed a novel method for subcellular localization of gram negative bacteria and plant protein dataset. Our method uses label power-set encoding scheme for the associated multi-label classification problem. Using a set of effective features also used in the literature our encoding significantly improves over the traditional approaches on several base classifiers. Our method was tested using a standard benchmark dataset and showed promising results.
Collections
- M.Sc Thesis/Project [145]