Skip to main content

Imbalanced set problems: Tools review to solve


1. Weka (Java Based)


  • You can subsample the majority class (try the filter SpreadSubsample ,
    GSVM-RU ).  
  • You can oversample the minority class, creating synthetic examples (try SMOTE).  
  • You can make your classifier cost sensitive (try the metaclassifier CostSensitiveClassifier).  
http://weka.wikispaces.com/space/content?tag=cost-sensitive

Each of the methods has it's own strengths and weaknesses, refer to the papers referenced in the documentation of each one.  If you use any of these and you need accurate probability estimates, you can use an isotonic regression to calibrate the output.

2. MATLAB


 3. KELL tool (Java Based)

http://sci2s.ugr.es/keel/software/prototypes/openVersion/Algorithms_20130703.pdf

4. LASVM

http://leon.bottou.org/projects/lasvm

publication: Fast Kernel Classifiers with Online and Active Learning



Comments

Popular posts from this blog

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partiti...

R tutorial

Install R in linux ============ In CRAN home page, the latest version is not available. So, in fedora, Open the terminal yum list R  --> To check the latest available version of r yum install R --> install R version yum update R --> update current version to latest one 0 find help ============ ?exact topic name (  i.e.   ?mean ) 0.0 INSTALL 3rd party package  ==================== install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package BED file parsing (Always use read.delim it is the best) library(MASS) #library(ggplot2) dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/" dirData="D:/research/F5shortRNA/TestRIKEN/" setwd(dirRoot) getwd() myBed="test.bed" fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed ## Read bed use read.delim - it is the  best mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = ...

SLURM tutorial : Basic commands

Main website for learning SLRUM http://slurm.schedmd.com/tutorials.html Submit a job with name and outputfile name(This will overwrite the parameters in shell file header ) sbatch   -J   job1  -o   job1.out  --partition=batch    myscript.sh   Basic shell script for job #!/bin/sh # #SBATCH --job-name=testJob #SBATCH --time=01:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --partition=dragon-default # # Display all variables set by slurm env | grep "^SLURM" | sort # cd /projects/dragon/FANTOM5/processed_data_feature ## All my commands for job will go here date;time; mkdir t1 How to submit a batch job sbatch myscript.sh How to check the list of jobs of a user squeue -u user1 squeue -u user1 -l # it will show in details   How to check the whole history and status of a job   scontrol show job=JOBID   How to use one particular node in interactive mode. Useful when all...