Tuesday, November 12, 2013

Feature subset selection Using Genetic Algorithm in MATLAB


function callGeneticAlgo

global mat
global trainInd
global testInd
[trainInd,~,testInd] = dividerand(1420,0.7,0,0.3);
global counter
global errList
counter = 1;
errList = [];
fileName=  '../features/alltopPNPDMF.feature' ;
mat = load(fileName);

[x,fval,exitflag,output,population,score] = gaFeaSelection(1588,100,10800); % param1 = #feature excludig label
% param2 =  population size
% param3 = sec to test (3 hour = 10800 sec)

dlmwrite('selected.GA',x,'delimiter','\n');

display('Done');

end


function [x,fval,exitflag,output,population,score] = gaFeaSelection(nvars,PopulationSize_Data,TimeLimit_Data)
% This is an auto generated MATLAB file from Optimization Tool.

% Start with the default options
options = gaoptimset;
% Modify options setting
options = gaoptimset(options,'PopulationType', 'bitString');
options = gaoptimset(options,'PopulationSize', PopulationSize_Data);
options = gaoptimset(options,'TimeLimit', TimeLimit_Data);
options = gaoptimset(options,'MutationFcn', {  @mutationuniform [] });
options = gaoptimset(options,'Display', 'iter');
options = gaoptimset(options,'PlotFcns', { @gaplotbestf });
[x,fval,exitflag,output,population,score] = ...
ga(@feaSelobjFun,nvars,[],[],[],[],[],[],[],options);

end



function [ evalValue ] = feaSelobjFun( x )
%FEASELOBJFUN Summary of this function goes here
%   Detailed explanation goes here
global mat
global trainInd
global testInd
global counter
global errList

Data = mat(:,1:end-1);
Label = mat(: , end);
selectedFeature = Data(:,(x~=0))*diag(x(x~=0));

svmStruct = svmtrain(selectedFeature(trainInd,:),Label(trainInd),...
            'kernel_function','linear'  ,...
            'method' , 'SMO' , 'kktviolationlevel',.55);
predictedOut = svmclassify(svmStruct,selectedFeature(testInd,:));

[X,Y,Thr,AUC] = perfcurve(Label(testInd),predictedOut,1);
 evalValue = 1-AUC;
errList(counter) = fval;
counter = counter + 1;

end



Friday, November 1, 2013

Feature subset selection toolbox collection



0. DEAP:


DEAP: Evolutionary Algorithms Made Easy
Genetic algorithm based multi-objective feature selection techniques.


http://jmlr.org/papers/volume13/fortin12a/fortin12a.pdf

1. Weka
Filter, Wrapper

2. Java-ML: A Machine Learning Library 

http://jmlr.org/papers/volume10/abeel09a/abeel09a.pdf

Entropy based methods (4)
Stepwise addition/removal (2)
SVMRFE
Random forests
Ensemble feature selection


3. MATLAB:

Sequential feature selection:

http://www.mathworks.com/help/stats/feature-selection.html

Genetic Algorithm based:

http://www.mathworks.com/matlabcentral/fileexchange/29553-feature-selector-based-on-genetic-algorithms-and-information-theory/content/GA_feature_selector.m

4. KELL

http://sci2s.ugr.es/keel/algorithms.php#featureselection

Imbalanced set problems: Tools review to solve


1. Weka (Java Based)


  • You can subsample the majority class (try the filter SpreadSubsample ,
    GSVM-RU ).  
  • You can oversample the minority class, creating synthetic examples (try SMOTE).  
  • You can make your classifier cost sensitive (try the metaclassifier CostSensitiveClassifier).  
http://weka.wikispaces.com/space/content?tag=cost-sensitive

Each of the methods has it's own strengths and weaknesses, refer to the papers referenced in the documentation of each one.  If you use any of these and you need accurate probability estimates, you can use an isotonic regression to calibrate the output.

2. MATLAB


 3. KELL tool (Java Based)

http://sci2s.ugr.es/keel/software/prototypes/openVersion/Algorithms_20130703.pdf

4. LASVM

http://leon.bottou.org/projects/lasvm

publication: Fast Kernel Classifiers with Online and Active Learning



MATLAB optimization toolbox usage with genetic algorithm


Useful tutorial


http://www.mathworks.com/products/global-optimization/description3.html

Best example of implementatoin with Constraint, objective function

http://www.mathworks.com/help/gads/examples/constrained-minimization-using-the-genetic-algorithm.html

More about how to use multi-objective

http://www.mathworks.com/discovery/multiobjective-optimization.html


http://www.mathworks.com/help/gads/examples/performing-a-multiobjective-optimization-using-the-genetic-algorithm.html

http://www.mathworks.com/help/gads/examples/multiobjective-genetic-algorithm-options.html

Example GAMULTOBJ (can handle Multiple Objective)  GA(can handle 1 objective)

Constrained Minimization Problem
We want to minimize a simple fitness function of two variables x1 and x2
   min f(x) = 100 * (x1^2 - x2) ^2 + (1 - x1)^2;
    x
  min f(x) = 100 * (x1^2 + x2) ^2 + (1 + x1)^2;
    x 
such that the following two nonlinear constraints and bounds are satisfied
   x1*x2 + x1 - x2 + 1.5 <=0, (nonlinear constraint)
   10 - x1*x2 <=0,            (nonlinear constraint)
   0 <= x1 <= 1, and          (bound)
   0 <= x2 <= 13              (bound)

Implementation of  Singleobjective Function
   function y = simple_objective(x)
    y = 100 * (x(1)^2 - x(2)) ^2 + (1 - x(1))^2;
 
Implementation of  Multiobjective Function

   function y = simple_multiobjective(x)
    y(1) = 100 * (x(1)^2 - x(2)) ^2 + (1 - x(1))^2;
    y(2) = 100 * (x(1)^2 + x(2)) ^2 + (1 + x(1))^2; 

Implementation of the Constraint Function
   function [c, ceq] = simple_constraint(x)
   c = [1.5 + x(1)*x(2) + x(1) - x(2);
   -x(1)*x(2) + 10];
   ceq = [];
 
Implementation of the GAMULTOBJ/ GA
 
ObjectiveFunctionOne = @simple_objective;
ObjectiveFunctionMult = @simple_multiobjective;
nvars = 2;
A = []; b = [];
Aeq = []; beq = [];
LB = [0 0];   % Lower bound , you can also set []
UB = [1 13];  % Upper bound , you can also set []
options = gaoptimset('PlotFcns',{@gaplotpareto,@gaplotscorediversity});

[x,fval,extflag, output,pop, score] = gamultiobj( ObjectiveFunctionMult, nvars, A, b, Aeq, beq, LB, UB, options )

[x,fval,extflag, output,pop, score] = gamultiobj( ObjectiveFunctionOne, nvars, A, b, Aeq, beq, LB, UB, options )


But in GA you can add non-linear constraint

http://www.mathworks.com/help/gads/examples/constrained-minimization-using-the-genetic-algorithm.html

function [c, ceq] = simple_constraint(x)
   c = [1.5 + x(1)*x(2) + x(1) - x(2);
   -x(1)*x(2) + 10];
   ceq = [];

[x,fval,extflag, output,pop, score] = ga(ObjectiveFunctionOne,nvars,[],[],[],[],LB,UB , ConstraintFunction);

Example GAMULTOBJ (can handle Multiple Objective)  GA(can handle 1 objective) for BITSTRING

%% using gamultobj
ObjectiveFunction = @myFitnessFnc_bitString;
nvars = 2;    % Number of variables
LB = [0 0];   % Lower bound
UB = [1 13];  % Upper bound
options = gaoptimset('PopulationSize',60,...
          'ParetoFraction',0.7,'PlotFcns',@gaplotpareto, 'PopulationType','bitstring');
[x fval flag output population score] = gamultiobj(ObjectiveFunction,nvars, [],[],[],[],[] ,[] ,options )

%% using GA

%% usig ga
ObjectiveFunction = @myFitnessFnc_bitString;
nvars = 2;    % Number of variables
options = gaoptimset('PopulationSize',60,...
          'ParetoFraction',0.7,'PlotFcns',@gaplotpareto, 'PopulationType','bitstring');
[x fval flag output population score] = ga(ObjectiveFunction,nvars, options)