Compartmentalization plays a major role in eukaryotic cells by making possible the fine regulation of complex biochemical pathways. Each protein needs the right biochemical context to operate, therefore the knowledge of the subcellular localization of a protein is essential in order to understand its functions and its pattern of interactions in protein networks.
BaCelLo is a predictor for the subcellular localization of eukaryotic proteins and it is based on several Support Vector Machines (SVMs) arranged in a decision tree (Fig 1). Starting from the residue sequence, BaCelLo discriminates five different localizations: secretory pathway, cytoplasm, nucleus, mitochondrion and chloroplast. The predictor analyzes the protein residue sequence and its evolutionary profile considering information from the whole sequence and from its N- and C-terminal regions. Three different predictors are available for three different eukaryotic kingdoms: Metazoa, Viridiplantae and Fungi.
The distinctive features of BaCelLo are:
- a homology-reduced dataset for training and testing the predictor, in order to avoid redundancy. This dataset was compiled starting from the Swissprot data base (release 48) and contains proteins whose subcellular localization was experimentally annotated. The dataset was reduced by similarity so that no protein in the dataset share more than 30% identity;
- the implementation of three kingdom-specific predictors to take into account differences in subcellular localization mechanisms;
- the evolutionary profile to extract evolutionary information from the residue sequence.
- a hierarchic tree for the predictions;
- the introduction of a unique balancing procedure in SVMs that corrects the biases between the different classes due to the disproportions in the training set .
BaCelLo proved to outperform all the other state-of-art methods publicly available, when validated on a set of protein sequences independent of the training set1.