Idea of this algorithm is quite simple. Similar we do ranking for feature selection we should do here, however what we also do is we try to make nonlinear transformations on each attribute independent maximizing ranking criteria value. The algorithm folowe these step:
This simple and fast procedure usually leads to improve classification accuracy.
Some example results comparison for 1NN classier (results marked as trans obtained with described algorithm ):
BER_loss is balanced error rate (mean error rate for each class)
class_loss is normal error rate
As it is shown it works quite fine, however variance of Cleveland is quite to high. Similar results obtained for Hyperthyroid are also very bed (BERR = 66%), so sometimes this method fails. Such high variance means that this results sometimes are very good and sometimes very bed (but usually good)