1. Weka (Java Based)
- You can
subsample the majority class (try the filter SpreadSubsample ,
GSVM-RU ).
- You can oversample the minority class, creating synthetic examples (try SMOTE).
- You can make your classifier cost sensitive (try the metaclassifier CostSensitiveClassifier).
Each of the methods has it's own strengths and weaknesses, refer to the papers referenced in the documentation of each one. If you use any of these and you need accurate probability estimates, you can use an isotonic regression to calibrate the output.
2. MATLAB
- RUSBoost algorithm available from fitensemble function. An example is shown here http://www.mathworks.com/help/stats/ensemble-methods.html#btgw1m1
- Additional suggestions on imbalanced data here http://www.mathworks.com/matlabcentral/answers/11549-leraning-classification-with-most-training-samples-in-one-category This advice is applicable to any number of classes.
http://sci2s.ugr.es/keel/software/prototypes/openVersion/Algorithms_20130703.pdf
4. LASVM
http://leon.bottou.org/projects/lasvm
publication: Fast Kernel Classifiers with Online and Active Learning
Comments
Post a Comment