Imbalanced set problems: Tools review to solve

1. Weka (Java Based)

You can subsample the majority class (try the filter SpreadSubsample ,

GSVM-RU ).
You can oversample the minority class, creating synthetic examples (try SMOTE).
You can make your classifier cost sensitive (try the metaclassifier CostSensitiveClassifier).

http://weka.wikispaces.com/space/content?tag=cost-sensitive

Each of the methods has it's own strengths and weaknesses, refer to the papers referenced in the documentation of each one. If you use any of these and you need accurate probability estimates, you can use an isotonic regression to calibrate the output.

2. MATLAB

RUSBoost algorithm available from fitensemble function. An example is shown here http://www.mathworks.com/help/stats/ensemble-methods.html#btgw1m1

Additional suggestions on imbalanced data here http://www.mathworks.com/matlabcentral/answers/11549-leraning-classification-with-most-training-samples-in-one-category This advice is applicable to any number of classes.

3. KELL tool (Java Based)

http://sci2s.ugr.es/keel/software/prototypes/openVersion/Algorithms_20130703.pdf

4. LASVM

http://leon.bottou.org/projects/lasvm

publication: Fast Kernel Classifiers with Online and Active Learning

My Blog

Search This Blog

Imbalanced set problems: Tools review to solve

Comments

Post a Comment

Popular posts from this blog

R tutorial

MATLAB cross validation

SLURM tutorial : Basic commands