My Blog

Posts

Showing posts from 2013

Feature subset selection Using Genetic Algorithm in MATLAB

function callGeneticAlgo global mat global trainInd global testInd [trainInd,~,testInd] = dividerand(1420,0.7,0,0.3); global counter global errList counter = 1; errList = []; fileName= '../features/alltopPNPDMF.feature' ; mat = load(fileName); [x,fval,exitflag,output,population,score] = gaFeaSelection(1588,100,10800); % param1 = #feature excludig label % param2 = population size % param3 = sec to test (3 hour = 10800 sec) dlmwrite('selected.GA',x,'delimiter','\n'); display('Done'); end function [x,fval,exitflag,output,population,score] = gaFeaSelection (nvars,PopulationSize_Data,TimeLimit_Data) % This is an auto generated MATLAB file from Optimization Tool. % Start with the default options options = gaoptimset; % Modify options setting options = gaoptimset(options,'PopulationType', 'bitString'); options = gaoptimset(options,'PopulationSize', PopulationSize_Data); options = gaoptimset(options,'TimeLimit', T...

Feature subset selection toolbox collection

0. DEAP: DEAP: Evolutionary Algorithms Made Easy Genetic algorithm based multi-objective feature selection techniques. http://jmlr.org/papers/volume13/fortin12a/fortin12a.pdf 1. Weka Filter, Wrapper 2. Java-ML: A Machine Learning Library http://jmlr.org/papers/volume10/abeel09a/abeel09a.pdf Entropy based methods (4) Stepwise addition/removal (2) SVMRFE Random forests Ensemble feature selection 3. MATLAB: Sequential feature selection: http://www.mathworks.com/help/stats/feature-selection.html Genetic Algorithm based: http://www.mathworks.com/matlabcentral/fileexchange/29553-feature-selector-based-on-genetic-algorithms-and-information-theory/content/GA_feature_selector.m 4. KELL http://sci2s.ugr.es/keel/algorithms.php#featureselection

Imbalanced set problems: Tools review to solve

1. Weka (Java Based) You can subsample the majority class (try the filter SpreadSubsample , GSVM-RU ). You can oversample the minority class, creating synthetic examples (try SMOTE). You can make your classifier cost sensitive (try the metaclassifier CostSensitiveClassifier). http://weka.wikispaces.com/space/content?tag=cost-sensitive Each of the methods has it's own strengths and weaknesses, refer to the papers referenced in the documentation of each one. If you use any of these and you need accurate probability estimates, you can use an isotonic regression to calibrate the output. 2. MATLAB RUSBoost algorithm available from fitensemble function. An example is shown here http://www.mathworks.com/help/stats/ensemble-methods.html#btgw1m1 Additional suggestions on imbalanced data here http://www.mathworks.com/matlabcentral/answers/11549-leraning-classification-with-most-training-samples-in-one-c...

MATLAB optimization toolbox usage with genetic algorithm

Useful tutorial http://www.mathworks.com/products/global-optimization/description3.html Best example of implementatoin with Constraint, objective function http://www.mathworks.com/help/gads/examples/constrained-minimization-using-the-genetic-algorithm.html More about how to use multi-objective http://www.mathworks.com/discovery/multiobjective-optimization.html http://www.mathworks.com/help/gads/examples/performing-a-multiobjective-optimization-using-the-genetic-algorithm.html http://www.mathworks.com/help/gads/examples/multiobjective-genetic-algorithm-options.html Example GAMULTOBJ (can handle Multiple Objective) GA(can handle 1 objective) Constrained Minimization Problem We want to minimize a simple fitness function of two variables x1 and x2 min f(x) = 100 * (x1^2 - x2) ^2 + (1 - x1)^2; x min f(x) = 100 * (x1^2 + x2) ^2 + (1 + x1)^2; x such that the following two nonlinear constraints and bounds are satisfied x1*x2 + x1 - x2 + 1.5 <...

video lecture on different topics

1. Linear Programming / Linear Optimization fundamental of operation research Lec-3 Linear Programming Solutions IIT madras http://www.youtube.com/watch?v=XEA1pOtyrfo http://nptel.iitm.ac.in 2.

Call matlab from C/C++ java or Call C/C++ java from Matlab matlab binary calling

Call Matlab from c/c++ , java etc http://www.mathworks.com/help/matlab/matlab_external/calling-matlab-software-from-a-c-application.html http://www.mathworks.com/help/matlab/matlab_external/compiling-engine-applications-with-the-mex-command.html#bsq78dr-9 Set env var First export LD_LIBRARY_PATH=/mnt/kaustapps/MATLAB-faculty/R2011b.app/bin/glnxa64/:/mnt/kaustapps/MATLAB-faculty/R2011b.app/sys/os/glnxa64/:$LD_LIBRARY_PATH UNIX Engine Example engdemo To verify the build process on your computer, use the C example engdemo.c or the C++ example engdemo.cpp . Copy one of the programs, for example, engdemo.c , to your current working folder: copyfile(fullfile(matlabroot,... 'extern','examples','eng_mat','engdemo.c'),... '.', 'f'); Build the executable file: mex('-v', '-f', fullfile(matlabroot,... 'bin','engopts.sh'),... 'engdemo.c'); Verify that the build worked by looking i...

libsvm usage

FAQ ===== http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q4:_Training_and_prediction DONWLOAD ======== Just need to download 1 zip file from main page. That's all. http://www.csie.ntu.edu.tw/~cjlin/libsvm/ INSTALL ========== make If you want to use parameter estimation, you need to change the code a bit and do following make clean; make install; DATAFORMAT ============== label 1:feat#1 2:feat#2 3:feat#3 N:feat#N Some available data =============== http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ Use heart_scale data. It works perfectly for all plot, cv and parameter estimation. 2-class CLASSIFICATION with RBF kernel with 5 fold CV ================================= train (support vectors are generated): ./svm-train -s 0 -t 2 -g 0.03125 -c 0.25 train.dat train.model train with CV(No support vectors are shown, just show your score: AUC,F-score) ./svm-train -...

Matlab code for Bayesian Network ( Bayes Net ) , Expectation Maximization

http://code.google.com/p/bnt/

matlab cross validation with svm [draft not final]

function test clc; matPos = csvread('pos.dat'); noPos= size(matPos,1); noFeature = size(matPos,2); labelPos= ones(noPos,1); % matPos= [matPos labelPos ]; matNeg = csvread('neg.dat'); noNeg= size(matNeg,1); labelNeg= -1*ones(noNeg,1); % matNeg= [matNeg labelNeg ]; % svmStruct = svmtrain(featureInTrain,featureOutTrain,'kernel_function','linear' , 'options' ,smo_opts); %,'rbf_sigma',100,'boxconstraint',25 noFold=5; c = cvpartition([labelPos ; labelNeg],'kfold', noFold); strArray = java_array('java.lang.String', 2); strArray(1) = java.lang.String('1'); strArray(2) = java.lang.String('-1'); myorder = cell(strArray) f = @(xtr,ytr,xte,yte) confusionmat(yte,@(xtr,ytr,xte)crossfun(xtr,ytr,xte, exp(z(1)),exp(z(2))),'order', [1 -1] ); cfMat = crossval(f,[matPos; matNeg], [ labelPos ; labelNeg],'partition',c); cfMat = reshape(sum(cfMat),3,3) minfn = @(z)crossval('mcr',[matPos; matNeg...

Java Template code

package com.cbrc.pipeline2; import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.DataInputStream; import java.io.FileInputStream; import java.io.FileWriter; import java.io.InputStreamReader; public class Test { String foldIn; String fnmFasta; String foldOut; String fnmOut; void init(String rootIn, String rootOut, String fnmInFasta, String fOut) { this.foldIn = rootIn; this.fnmFasta = this.foldIn + fnmInFasta; this.foldOut = rootOut; this.foldOut = this.foldOut + fOut; } void loadFasta(){ try { FileInputStream fstre...

cluster using grid qsub

source /home/ge2011.11/cbrc/common/settings.sh qsub -cwd -e error.log -b y /usr/bin/blastpgp -i N_C_ternimus_Paxillin.fa -d /home/data/GenomeAA/blastdb/nr -e 1e-10 -m 7 -o psiblastout.xml -j 3 qstat

JAVA SORT COLLECTION List or Vector or arraylist or array

http://docs.oracle.com/javase/tutorial/collections/interfaces/order.html Sort vector or List import java.util.*; public class Test { class BedFormat implements Comparable { String chrom; int end; public int compareTo(BedFormat obj) { return (this.end > obj.end ) ? 1 : -1 ; } } public static void main() { List vp = new ArrayList (); vp.add(new BedFormat()); vp.add(new BedFormat()); .... vp.add(new BedFormat()); Collections.sort( vp ); // print // it will print according to end variable of object } } Example : Finding Median from a list class ExpReplica implements Comparable { double valReplica; public ExpReplica(double valReplica) { super(); this.valReplica = valReplica; } public int compareTo(ExpReplica obj) { ...

Java Thread tutorial

// Wait for the child threads to finish job before main thread ends public Class TestJoin{ public static void main(String args[]) { Vector vecThread = new Vector (); for(int curLen=1; curLen<=5;curLen= curLen+1) // { vecThread.add( new Thread(new Inner(curLen)) ); } for(int i=0;i { vecThread.get(i).start(); } try { for(int i=0;i { vecThread.get(i).join()...

Map reduce / Hadoop tutorial by example

Links http://www.philippeadjiman.com/blog/2009/12/07/hadoop-tutorial-part-1-setting-up-your-mapreduce-learning-playground/ http://hadooptutorial.wikispaces.com/Custom+combiner Example

Gaussian Process GP

https://www.youtube.com/watch?v=16oPvgOd3UI function GP_1d kernel=5; switch kernel case 1; k = @(x,y) 1*x'*y; % linear case 2; k = @(x,y) 1*min(x,y); % brownian motion case 3; k = @(x,y) exp(-100*(x-y)'*(x-y)); % squared case 4; k = @(x,y) exp(-1*sqrt(x-y)'*(x-y)); % Ornistin case 5; k = @(x,y) exp(-1*sin(5*pi*(x-y))^2); % periodic end % choose points at which to sample x = (0:.005:1); n = length(x); % covariance matrix C = zeros(n,n); for i=1:n for j=1:n C(i,j) = k (x(i), x(j)) ; end end % sample from gaussian process at this points u = randn(n,1); [A, S , B ] = svd(C); z = A *sqrt(S)*u; % plot figure(2); hold on; plot(x, z, '.-'); axis([0, 1, -2, 2]); end ============ IN 2D ======== function GP_2d kernel=3; switch kernel case 1; k = @(x,y) 1*x'*y; % linear ...

Ruby Tutorial

Ruby Installation with graphics (painful) ============================== * Install RVM for Ruby - \curl -#L https://get.rvm.io | bash -s stable --autolibs=3 --ruby * Install TK from ActiveTcl - http://www.activestate.com/activetcl * Run rvm reinstall 2.0.0 --enable-shared --enable-pthread --with-tk --with-tcl Recommended Sites ================== 1. From author (Yukihiro Matsumore) http://rubymonk.com/ 2. Basic http://www.tutorialspoint.com/ruby/ruby_variables.htm Interactive Tutorial ============== http://tryruby.org/ Run ruby ========= 1. Using interpreter irb ( start interpreter) load "fname.rb" ( load the file) ruby "fname.rb" ( run the file) 2. Using ruby command ruby "fname.rb" ( run the file) 3. Type of Variables & Mehod ==================== @instanceVariable @@classVariable $GlobalVariable 4. instanceMethod def fnc ... end 5. ClassMethod ============= def self.fnc ... end 6. Ano...

Haskell tutorial

Important link for tutorial =================== http://www.haskell.org/haskellwiki/Haskell_in_5_steps#Where_to_go_from_here http://www.cs.nott.ac.uk/~gmh/book.html http://rigaux.org/language-study/syntax-across-languages-per-language/Haskell.html http://learnyouahaskell.com/chapters You can change prompt by :set prompt 0. how to run/quit haskell file =============================== a. ghci ( it will load Glasgow Haskell Compiler) b. load file > :l filename.hs c. fncInsideFile parameter d. :quit 1. bracket Issues in ============= fnc(a) // wrong fnc a ;// right, parameters are space seperated Parentesis a. for tuple -- let tup = ( 1, "sss") b. (x:xs) -- x is first element, xs is rest elements c. for curry function/section. if you don't use () it will not work -- correct useSection = (/2) -- Incorrect useSection = /2 instead of using 1 + 2 ...

Ghostscript pdf merge split

1. Cut range of page/ split pages from pdf ========================= gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=3 -dLastPage=7 -sOutputFile=out.pdf in.pdf 2. Merge multiple pdfs ================ gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -sOutputFile=out.pdf in1.pdf in2.pdf

p value bonferroni correction

http://www.aaos.org/news/aaosnow/apr12/research7.asp For example, a researcher is testing 20 hypotheses simultaneously, with a critical P value of 0.05. In this case, the following would be true: P (at least one significant result) = 1 – P (no significant results) P (at least one significant result) = 1 – (1-0.05) 20 P (at least one significant result) = 0.64 Thus, performing 20 tests on a data set yields a 64 percent chance of identifying at least one significant result , even if all of the tests are actually not significant . Therefore, while a given α may be appropriate for each individual comparison, it may not be appropriate for the set of all comparisons. http://www.fon.hum.uva.nl/praat/manual/Bonferroni_correction.html In general, if we have k independent significance tests at the α level, the probability p that we will get no significant differences in all these tests is simply the product of the individual probabilities: (1 - α) k . For exampl...

linux basic system admin tutorial from IBM

install software pre-compiled or from source ----------------------------------------------------------------------- http://www.ibm.com/developerworks/linux/library/l-roadmap9/ Managing shared library ----------------------------------------------- http://www.ibm.com/developerworks/linux/library/l-lpic1-v3-102-3/index.html Basis System admin ------------------------------------- http://www.ibm.com/developerworks/training/kp/l-kp-command/index.html http://www.ibm.com/developerworks/linux/tutorials/l-basics/ All type of work for System admin in linux ------------------------------------------------------------------- http://www.ibm.com/developerworks/linux/library/l-lpic1-v3-map/index.html all type of linux problem solution --------------------------------------------------------------- http://www.ibm.com/developerworks/linux/library

C/C++ basic tutorial

Memory allocation stack vs heap ------------------------------------------------------ http://www.learncpp.com/cpp-tutorial/79-the-stack-and-the-heap/

Ghost script manipulate pdf files

1. Merger pdf files ================== gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=combinedpdf.pdf -dBATCH 1.pdf 2.pdf 3.pdf