Saturday, December 24, 2011

C/C++ map



map { string,info> mymap;
map {string,info>::iterator it;


// insert into map

it = mymap.find(chrmName);
if ( it !=mymap.end() ) // found id{
            it->second.fnc();
 }else // not found{
            mymap.insert(  pair( chrmName, info()  )  ) ;
 }

// iterate over map
cout << "map size:" << mymap.size() << endl;

 map::iterator it3;
 int count=0;
 int tot = 0;
  
 for ( it3=mymap.begin() ; it3 != mymap.end(); it3++ ){

        count++;
        cout << it3->first.c_str() << "=>" ;
        cout<< it3->second.getCount() << endl;
        tot = tot+ it3->second.getCount();

    }


C/C++ vector



void fnc() {
     vector nameV;

      nameV.push_back("ab");
      nameV.push_back("cd");


      int sz = nameV.size();
      cout<< " Size of Vector:" << sz << endl;

      for(int i=0;i
      {
          cout<<  nameV[i] << endl;
      }



}

Wednesday, December 21, 2011

JAVA uniform random number [0 , 1]


import java.util.Random;

 Random myrand = new Random();
double zeroToOne = myrand.nextDouble();

JAVA file operation

 FileInputStream fstream = new FileInputStream("allUpper.seq");
 DataInputStream in = new DataInputStream(fstream);
  BufferedReader br = new BufferedReader(new InputStreamReader(in));

BufferedWriter out = new BufferedWriter(new FileWriter("kMer.txt")); // for write

while ((seqLine = br.readLine()) != null) {
               out.write(seqLine);
               out.write("\n");
  }
br.close();
 in.close();
 fstream.close();
  out.close();

Sunday, December 18, 2011

linux sshfs mount remote folder in local disk


sshfs   alamt@kw2237.rc:/home/KAUST/alamt  /home/tanviralam/kaustmachine/
ssh -X  alamt@kw2237.rc

run matlab
/mnt/kaustapps/MATLAB-faculty/matlab.R2011b



http://www.go2linux.org/sshfs-mount-remote-filesystem-using-ssh

Installation of sshfs
Get the packages
  • For Debian:
    apt-get install fuse-utils sshfs
  • For Ubuntu:
    sudo apt-get install fuse-utils sshfs
  • For Fedora and Centos:
    yum install fuse-utils sshfs
  • For Mandriva: urpmi:
    urpmi fuse-utils sshfs
Next step is to mount the fuse module[localuser: tanviralam remoteuser: alamt]

1. modprobe fuse

2. Next create the mount point
mkdir /home/tanviralam/remote

chown [local-user]:[your-group] /mnt/remote-fs/

chown tanvialam:777 /home/tanviralam/remote

3. Add yourself to the fuse group

adduser [local-user] fuse
adduser tanviralam fuse

4. Untill here all the command should be issued as root, now switch to your users and mount the remote filesystem.

sshfs remote-user@remote.server:/remote/directory /home/tanviralam/remote/

Wednesday, November 23, 2011

JAVA number formatter


import java.text.NumberFormat;

NumberFormat f = new DecimalFormat("#00.00000000");
f.setGroupingUsed(false);
String refinedNumber = f.format(arrT[i]);

Saturday, November 5, 2011

MATLAB cell array


Accessing cell array : {}-for getting array, () - for elem of this array
=============================================
carray{i}
carray{i}(elemIndex)


example : access cell array and string compare
================================

for i=1:totSample % ID
   
    for j=1:totRepeat
       
        a =( promoter_ID{1}(i) ) ;  % check the first column cell array
        b = ( promoter_repeat{1}(j) ); % check the first column cell array
        
        if strcmp(a,b) % yes it overlaps with some repeat
          display('yes')    
        else
          display('no')
        end    
 
    end
end

Thursday, November 3, 2011

linux connect via ssh on cluster


ssh -t dsa // it will promt for password and name of key file

1. suppose file name is mykey. It will generate two file mykey and mykey.pub

2. it will also generate hidden folder .ssh. There will be a file named known_hosts in that folder.

3. copy this two file into .ssh folder

cd .ssh
cp  ../mykey ../mykey.pub .

4. keep this folder with 3 files for future.

5. Once admin give u access to server, type

ssh alamt@cluster.cbrc.kaust.edu.sa

First time it will prompt password. But from next it will not.

For cluster 
1. cd user
2. cd DMF6
3. cd ../user/alamt
4. cp -rf DMF6 ../alamt/.
5. cp run_job_vlad.sh run_job_tanvir.sh
6.  chmod a+x run_job_tanvir.sh
7. ./run_job_tanvir.sh node25 CGI tanvir_01
8.


scp Copy file from remote -->  local
======================

open terminal in local machine

scp source destination


> scp   alamt@cluster.cbrc.kaust.edu.sa:/home/alamt/DMF6/CGI.tanvir_01.results/*.rank    /home/tanviralam/tst


scp Copy from local --> remote
======================

scp local remote

> scp /home/tanviralam/myfile.txt alamt@cluster.cbrc.kaust.edu.sa:/home/alamt/store/myfile.txt


sftp  Connect to remote pc using GUI
======================
1. login using ssh alamt@cluster.cbrc.kaust.edu.sa
2. type following in location bar

  sftp://alamt@cluster.cbrc.kaust.edu.sa

3. So you can see and use gui of remote machine.


Cluster commands
==================
1. see number of nodes in cluster:  bhosts

2. See the number of processor in nodes:

ssh node25 cat /proc/cpuinfo | grep -c processor

Or you can login and then see it

    First login into that node. for example login into node 25 .
    type> ssh node25
    type> ssh node25 cat /proc/cpuinfo | grep -c processor

3. see the running jobs

   type> top

Saturday, October 22, 2011

matlab normalization

Using meand stddev


 function [featureIn,meanFeatIn, stdDevFeatIn] = mynorm_train(featureIn)
meanFeatIn = mean(featureIn,1);
stdDevFeatIn = std(featureIn,1,1);
noSample = size(featureIn,1);
for i=1:noSample
    featureIn(i,:) = (featureIn(i,:) - meanFeatIn) ./ stdDevFeatIn ;
end
end

 function [testFeatureIn] = mynorm_test(testFeatureIn,meanFeatIn,stdDevFeatIn)
    noSample = size(testFeatureIn,1);
    noInputFeat = size(testFeatureIn,2);
    for i=1:noSample
            testFeatureIn(i,1:noInputFeat) = (testFeatureIn(i,1:noInputFeat) - meanFeatIn ) ./ stdDevFeatIn;          
    end  
end





Using range

 function [ N_feature,feature_range,feature_bases ] = normalize( features )
%NORMALIZE Summary of this function goes here
%   Detailed explanation goes here
% samples are in rows

for NoF = 1:size(features,2)
    F_min(NoF) = min(features(:,NoF));
    F_max(NoF) = max(features(:,NoF));
  
    feature_range(NoF) = (F_max(NoF)-F_min(NoF))/2;
    feature_bases(NoF) = (F_max(NoF)+F_min(NoF))/2;
  
    for NoS = 1:size(features,1)
        if (feature_range(NoF) ~=0)
            N_feature(NoS,NoF) = (features(NoS,NoF)-feature_bases(NoF))/feature_range(NoF);
        else
            N_feature(NoS,NoF)=features(NoS,NoF)-feature_bases(NoF);
        end
    end
end

end
   
function [ feature ] = normalize_t( t_features,range,bases )
%NORMALIZE_T Summary of this function goes here
%   Detailed explanation goes here
range = repmat(range,size(t_features,1),1);
bases = repmat(bases,size(t_features,1),1);
feature = (t_features - bases)./range;
end

matlab feature ranking

used function rankfeatures (consider sample as column)
====================================================

train = [trainFeature trainLabel];
[IDX ,Z] = rankfeatures(trainFeature' ,trainLabel' ,'Criterion', 'ttest');
%ttest / entropy/ etc...



topRankedFeature = (size(trainLabel,1)) / 2 ; 

classify( testFeature( :,IDX(1:topRankedFeature) ),   ...
          trainFeature( :,IDX(1:topRankedFeature) ), trainLabel, ...     'diagquadratic' ) % liner/quadratic/diagquadratic etc


% transpose as it takes sample as column vector
%ttest / entropy/ etc...  
%IDX is the list of indices to the rows in X with the most significant features.  
%Z is the absolute value of the criterion used (see below) 


 

MATLAB discriminant analysis

function used classify
===================

predictedClass = classify(testSample,trainSample,trainGroup,diagquadratic ) % liner/quadratic/ etc





Wednesday, October 19, 2011

MATLAB check unique string in file

function identifyDuplicate
clc;

uniqueSeq={};
dupSeq={};

index=1;
uniqueIndex=1;
dupIndex=1;
uniq=[];
dup=[];
isDuplicated = 0;
fid = fopen('1400M_from_287PS_287NS.ranked','r');


tline = fgetl(fid); % ******
 while ischar(tline)
    
     consensusSeq = fgetl(fid); % Consessus: AAACC
     consensusSeq = upper(consensusSeq);

     curSeq = sscanf(consensusSeq,'%*s %s', [1, inf]);
     curSeq = upper(curSeq);

     fgetl(fid); % Threshold
     fgetl(fid); % Coverage
     fgetl(fid); % p-value
     fgetl(fid); % r1
     fgetl(fid); % r2
     fgetl(fid); % r3
     fgetl(fid); % r4
    
     isExist=0;
    
     for en=1:uniqueIndex -1    
         exist = strcmp(curSeq,uniqueSeq{en})
         if exist ==1
            isDuplicated = 1;
             break;
         end
     end
      
     if( isDuplicated == 1 ) % already exist       
         dupSeq{dupIndex}  = {curSeq};      
         dupIndex = dupIndex + 1;
         dup = [dup;index];
     else % not found
        
         uniqueSeq{uniqueIndex} = {curSeq};  
         uniqueIndex = uniqueIndex + 1;
         uniq = [ uniq;index];
     end
      
   
     
     tline = fgetl(fid); % next ******
     index = index + 1;
     isDuplicated = 0;
    
    
 end


 dlmwrite('unique',uniq,'\t'); % index of unique entry
 dlmwrite('dup'   ,dup   ,'\t'); % index of duplicate entry


fclose(fid);


Tuesday, October 11, 2011

MATLAB cross validation

// use built-in function
samplesize = size( matrix , 1);
c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold

///// output in matlab console
K-fold cross validation partition
             N: 10
   NumTestSets: 4
     TrainSize: 8  7  7  8
      TestSize: 2  3  3  2
//////////////////////

for i=1 : k
   trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test
   testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train

   trainMatrix = matrix (  matrix(trainIdxs ), : );
   testMatrix  = matrix (  matrix(testIdxs  ), : );
end

//// now calculate performance


%%  calculate performance of a partition
    selectedKfoldSen=[];selectedKfoldSpe=[];selectedKfoldAcc=[];
    indexSen=1;indexSpe=1;indexAcc=1;
    if ( kfold == (P+N) )% leave one out
        sensitivity = sum(cvtp) /( sum(cvtp) + sum(cvfn) )
        specificity = sum(cvtn) /( sum(cvfp) + sum(cvtn) )
        accuracy = (sum(cvtp)+sum(cvtn)) / ( sum(cvtp) + sum(cvfn) + sum(cvfp) + sum(cvtn) )
       
    else
       
        sensitivity=[]; specificity=[];accuracy=[];
        for i=1: kfold
            if( ( cvtp(i) + cvfn(i) )==0) % no POSITIVE sample was selected for evaluation
                % sensitivity(i) = 1 ;
            else
                sensitivity(indexSen) = cvtp(i) /( cvtp(i) + cvfn(i) ) ;     
                indexSen = indexSen + 1;
                selectedKfoldSen = [selectedKfoldSen i];
            end
           
            if ( cvfp(i) + cvtn(i) ) ==0 % no POSITIVE sample was selected for evaluation
                   %  specificity(i)=  1 ;
            else
                specificity(indexSpe)=  cvtn(i) /( cvfp(i) + cvtn(i) ) ;
                indexSpe = indexSpe + 1;
                selectedKfoldSpe = [selectedKfoldSpe i];
            end
            accuracy(i) = (cvtp(i)+ cvtn(i)) / ( cvtp(i) + cvfn(i) + cvfp(i) + cvtn(i) );
        end
       
        sen = mean(sensitivity)
        spe = mean(specificity)
        acc = mean(accuracy)
       
    end

   
    dlmwrite('cv',[ cvtp' ] , 'delimiter','\t','-append');
    dlmwrite('cv',[ cvfn' ] , 'delimiter','\t','-append');
    dlmwrite('cv',[ cvfp' ] , 'delimiter','\t','-append');
    dlmwrite('cv',[ cvtn']  , 'delimiter','\t','-append');
   
     dlmwrite('cv',[ selectedKfoldSen]  , 'delimiter','\t','-append');
    dlmwrite('cv',[ selectedKfoldSpe]  , 'delimiter','\t','-append');
   
    dlmwrite('cv',[ sensitivity]  , 'delimiter','\t','-append');
    dlmwrite('cv',[ specificity]  , 'delimiter','\t','-append');
    dlmwrite('cv',[ accuracy]     , 'delimiter','\t','-append');

Sunday, October 9, 2011

MATLAB distance based learning

kNN
=========
all=        [ 1 2 ; 3 4 ; 5 6 ; 7 8; 9 10];
newpoint = [ 1 7];
[indexes,distances] = knnsearch(all , newpoint,'k', 3) % 3 nearest neighbour




Thursday, October 6, 2011

MATLAB confusion matrix


%  test_class  & predicted_class must be same dimension
% 'order' - describes the order of label. Here labels are 'g' as positive and 'h' as negative

[C,order] = confusionmat( test_class(1: noSampleTest), predicted_class, 'order', ['g' ;'h'] )
tp = C(1,1);
fn = C(1,2);
fp = C(2,1);
tn = C(2,2);
sensitivity = tp /( tp + fn )
specificity = tn /( fp + tn )
accuracy = (tp+tn) / (tp+fn+fp+tn)
tpr = sensitivity
fpr = 1-specificity
precision = tp /( tp + fp )
fVal = (2*tpr*precision)/(tpr+precision)

MATLAB string manipulation



string cell array
================
ar = {'aa';'bbbbb'}



mat2str   (convert number to string)
======================
parameter = '-s 0 -t 0 -c ';
for c=1:2
    nextp = [parameter mat2str(  c) ]
end

string comparison
================
 sa='ab';
 sb='aba';
 t = strcmp(sa , sb )
if( t == 1)
  display('Yest match');
else
   display('No match')
end


strmatch  find exact string in cell array
=======================
list = {'max', 'minimax', 'maximum', 'max'}
x = strmatch('max',list,'exact')

find  string starts with in cell array
=======================
list = {'max', 'minimax', 'maximum', 'max'}
x = strmatch('max',list)

String Comparison
================
 strmatch('ab' , 'abc') ; % return 1
strcmp('ab' , 'abc') ; % return0

Find index of  char/ string strfind  find the pattern
====================
S = 'Find the starting indices of the pattern string';
strfind(S, 'in')
ans =
     2    15    19    45
 
Sub String
===================
 
s = s(1 : n) 

String trim
==========
out = strtrim(ins)

Reading from file  % parse each line
===========================
function createLogo
motifCount=0;
fNamePPM = '../gene.features.v2/v2_C_287_NC_48_profile.model'; % columns as Letter
fid = fopen(fNamePPM);

tline = fgets(fid);
while ischar(tline)
      if tline(1) =='*' % start of a motif    
           motifCount = motifCount + 1;
       
           tline = fgets(fid); % read ID
           %% tokenize the line
           remain  = tline;
           countToken = 0;
           while true
                [word, remain] = strtok(remain);
                if isempty( word  )
                    break;
                end
                countToken = countToken +1;
                if countToken ==2
                   motifLen = str2num(word);
                   profileMatrix = zeros(4,motifLen);
                end
            disp( word  );
           end

        tline = fgets(fid); % score
        tline = fgets(fid); % A
        profileMatrix(1,1:motifLen) = str2num(tline);
     end
     tline = fgets(fid); % to read for next iteration
   
end
fclose(fid);
end

String tokenizer ( strtok )
==============
remain  = wholeString;
    while true
       [word, remain] = strtok(remain);
       if isempty( word  ),  break;  end
       disp( word  );  
    end

% convert tab delimited line into vector of  number
=====================================
myvector = str2num( myLine );

Read file with FIXED COLUMN like matrix but content may be string

fid = fopen('scan1.dat');
C = textscan(fid, '%d\t%f\t%f\t%f\t%f'); % each row have 5 column
C{1} (1) ; % first column , 1st row
C{1} (2) ; % first column , 2nd row
fclose(fid);

[ covTarget covBG score source consensus ] = textread(fnameStat,'%f\t%f\t%f\t%s\t%s');
% here in the file there are 5 columns , first 3 are numeric, and last 2 are string


Write cell Array dlmwrite is not for cell array 


fid = fopen('LLGencodeCommonID.txt', 'w');
fprintf(fid, '%s\n', commonID{2}); % write 2nd column
fclose(fid);



sscanf:  Parsing word inside of a string
=====================
     consensusSeq = fgetl(fid) % Consessus: AAACC
     curSeq = sscanf(consensusSeq,'%*s %s', [1, inf]) % curSeq = AAACC


Tuesday, October 4, 2011

MATLAB normalize train and test



matrixTrain = load('matrix.train');
 function [matrixTrain , meanFeatIn, stdDevFeatIn] = mynorm_train(matrixTrain)


featureIn = matrixTrain(:,1:end-1);
featureOut = matrixTrain(:,end);

meanFeatIn = mean(featureIn,1);
stdDevFeatIn = std(featureIn,1,1);
meanFeatOut = mean(featureOut,1);
stdDevFeatOut = std(featureOut,1,1) ; 
dlmwrite('normInfo',[meanFeatOut stdDevFeatOut],'delimiter','\t');
noSample = size(featureIn,1);
 for i=1:noSample
            featureIn(i,:) = (featureIn(i,:) - meanFeatIn) ./ stdDevFeatIn ;
            featureOut(i,:) = (featureOut(i,:) - meanFeatOut) ./ stdDevFeatOut ;
 end
matrixTrain = [ featureIn featureOut];

end

matrixTest = load('matrix.test');
function [matrixTest] = mynorm_train(matrixTest,meanFeatIn, stdDevFeatIn,meanFeatOut ,stdDevFeatOut )

noSample = size(matrixTest,1);
noInputFeat = size(matrixTest,2) - 1;
for i=1:noSample
            matrixTest(i,1:noInputFeat) = (matrixTest(i,1:noInputFeat) - meanFeatIn )  ./ stdDevFeatIn ;
            matrixTest(i,noInputFeat+1) = (matrixTest(i,noInputFeat+1) - meanFeatOut )  ./ stdDevFeatOut ;  
end  






Sunday, October 2, 2011

matlab matrix to weka .arff format conversion


inputFormat ( matrix are tab seperated , last column indicates the label, This is for two class problem,
for multiclass need to change the code in few lines)
======================================================================
5.0  6.5  7.9 +1
6.6  8.9  6.1 -1
code
=======
function matlabToarff


% convert matrix int arff(Attribute relation file format )format
clc;
fNameData = 'seqLabel';
fNameARFF = 'seqLabel.arff';

fidARFF = fopen( fNameARFF ,'w');
matrix = load(fNameData);
feature = matrix ( : , 1:end-1);
label = matrix (: , end) ;
noFeature = size(feature,2);
noSample = size(feature,1);

%%%%%%%%%% header

fprintf(fidARFF,'%s\n\n','@RELATION LNCRNAsequence');
for i=1:noFeature % noFeature
         fprintf(fidARFF,'%s\t%d\t%s\n' ,'@ATTRIBUTE' , i, 'NUMERIC' );
end
fprintf(fidARFF,'%s\n\n','@ATTRIBUTE class {+1,-1 }');

%%%%%%%%%%  data
fprintf(fidARFF,'%s\n','@DATA');
for r=1:noSample
     for c=1:noFeature
          fprintf(fidARFF,'%f,',matrix(r,c) );
     end
     if label(r)==1
            fprintf(fidARFF,'%s\n', '+1');
     else
            fprintf(fidARFF,'%s\n', '-1');
     end
end

fclose(fidARFF);

end

Thursday, September 29, 2011

matlab matrix to svm format conversion


inputFormat
============
28.7967,16.0021,2.6449,0.3918,0.1982,27.7004,22.011,-8.2027,40.092,81.8828,g

svmFormat
=============
g    1:28.796700    2:16.002100    3:2.644900    4:0.391800    5:0.198200    6:27.700400    7:22.011000    8:-8.202700    9:40.092000    10:81.882800

code
=======

fid = fopen('magic04.data','r');
raw_data = textscan(fid,'%f %f %f %f %f %f %f %f %f %f  %c','delimiter',',');
data = [raw_data{1:10}];
class = raw_data{11};
fclose(fid);

svmTrain = fopen( 'magic04.data.svm' ,'w');
noRow = size(data,1);
noCol = size(data,2);
for r=1:noRow
    label = class(r);
    fprintf(svmTrain,'%s\t',label );
    for c=1:noCol
            fprintf(svmTrain,'%d:%f\t',c,data(r,c) );
    end
    fprintf(svmTrain,'\n' );
end
fclose(svmTrain);

Tuesday, September 27, 2011

MATLAB file operation


 ============== TEXTSCAN=================
fid = fopen('magic04.data','r');
raw_data = textscan(fd,'%f %f %f %f %f %f %f %f %f %f  %c','delimiter',',');
data = [raw_data{1:10}];
class = raw_data{11};
fclose(fid);


============== FGETL , FPRINTF =================
fid = fopen('positive.seq','r');
fidTrain = fopen('randomTrain.seq','w');

 tline = fgetl(fid);
 while ischar(tline)
     disp(tline);
      fprintf(fidTrain ,'%s\n',upper(tline));
     tline = fgetl(fid);
 
 end

fclose(fidTrain );
fclose(fid);

Saturday, September 24, 2011

MATLAB model evaluation sensitivity/specificity/tpRate/fpRate



Confusion matrix
================
% 1- +ve class
% 0  -ve class
% if you change order change the 'order' parameter in confusionmat function

[C,order] = confusionmat( originalOut , predictedOut,'order', [1 0])
sensitivity = C(1,1)/(C(1,1) + C(1,2))
specificity = C(2,2)/(C(2,1) + C(2,2))

MATLAB decision tree classregtree both classification and regresstion


matrixTrain = load('primate.train' );
featureInTrain = matrixTrain( :, 1:end-1);
featureOutTrain = matrixTrain(:,end);

matrixTest = load('primate.test' );
featureInTest = matrixTest( :, 1:end-1);
featureOutTest = matrixTest(:,end);


% tree
% t = classregtree(featureInTrain,featureOutTrain,'method','classification');
% predictedOut =str2double( eval(t,featureInTest))

%tree bagger
bnew = TreeBagger(10 ,featureInTrain , featureOutTrain, 'Method','classification') % for 10 tree
predictedOut = predict(bnew, featureInTest)
predictedOut = str2double(predictedOut)


t = bnew.Trees{1,1}
t =bnew.Trees{1,2}
t =bnew.Trees{1,3}
... ... ...
t =bnew.Trees{1,10}



Wednesday, September 21, 2011

MATLAB ANN artificial neural network train test


sample1: 1 2 3 4  label: A
sample2: 1 5 7 7  label: B

  Every sample must be put in a column
=============================
featureIn
1 1
2 5
3 7
4 7
featureOut
A
B

function [yPredict] = doBP(trainFeature,trainValue)

trainFeature = trainFeature'; % to fit matlab format
trainValue = trainValue';% to fit matlab format



% % version 2010a
 net=newff(trainFeature,trainValue,[13 1],{'tansig' 'purelin'}); % tansig purelin

% version 2009a
% net=newff(trainFeature,trainValue,[13 1]);
% net.layers{1}.transferFcn = 'tansig';
% net.layers{2}.transferFcn = 'purelin';



net=init(net);

net.trainParam.epochs = 99999999;
net.trainParam.goal = 0.0000001; %(stop training if the error goal hit)
net.trainParam.lr= 0.000001; % (learning rate, not default trainlm) [0.01]
net.trainParam.epochs = 99999999;
net.trainParam.goal = 0.0000001; %(stop training if the error goal hit)
net.trainParam.lr= 0.000001; % (learning rate, not default trainlm) [0.01]
% net.trainParam.lr_dec = 0.000001;
% net.trainParam.mc = 0.9;
% net.trainParam.min_grad = 1e-10;
net.trainParam.show=1 ; %(no. epochs between showing error) [25]
net.trainParam.time =100000; %    (Max time to train in sec) [inf]
net.trainFcn = 'trainlm'; % trainrp trainbfg  trainlm

net.divideParam.trainRatio = 80/100;  % Adjust as desired
net.divideParam.valRatio = 20/100;  % Adjust as desired
net.divideParam.testRatio = 0/100;  % Adjust as desired


% TRAIN
[net,tr,Ytrain,E,Pf,Af] = train(net,trainFeature,trainValue);  %train(net,subset_active_input',subset_active_output');
plotperf(tr);

save net; % will save the network (net) as net.mat.

end

function doTesting(testFeatureIn)

      testFeatureIn = testFeatureIn';  % to fit matlab format
      testFeatureOut = testFeatureOut';  % to fit matlab format
      load net ;% will retrive the network and put it in your workspace

      [predictedY,Pf,Af,E,perf] = sim(net,testFeatureIn);
end

Tuesday, September 20, 2011

C/C++ strtok string tokenizer

    
    char delims[] = "\t \n";
    char *result = NULL; // always hold the token serially

    result = strtok( curline, delims ); // get first token
    count = 0;
    while( result != NULL ) {
          count++;
        switch(count){
            case 1:
               strcpy(chromosomename, result);
               break;
            case 2:
               strcpy(sStart,result);
               startIndex=atoi(result);
               break;
            default:
               fprintf(fppromotor,"%s\t",result);
               break;
        }

        result = strtok( NULL, delims ); // get next token

    }

Monday, September 19, 2011

File Operation in C++

    ifstream  fpA;

    fpA.open( "data",    ios::in );




    fpA.close();


Sunday, September 18, 2011

MATLAB load textscan or save a matrix

load
============
matrix = load('engine.train');


textscan
==========================

fd = fopen('magic04.data','r');
raw_data = textscan(fd,'%f %f %f %f %f %f %f %f %f %f  %c','delimiter',',');
data = [raw_data{1:10}];
class = raw_data{11};

saving
=============

dlmwrite('engine.train',[trainFeatureIn trainFeatureOut] , '\t');


Saturday, August 20, 2011

C/C++ map [Object type]

class info
{
    char infoLine[MAXLINELEN];
    char infoMark[NOSPECIES];
    int count;

public:

    //info(){  strcpy(infoLine,""); strcpy(infoMark,""); for(int i=0;i

    // for new  entry
    info(char* li, int index, char mr){

        strcpy(infoLine,li);
        //cout << infoLine << endl;
        for(int i=0;i
            infoMark[i]='+' ;
        infoMark[index]=mr;
        count = 1;
    }

    // for existing entry
    void insertmark(int index, char currentMark) { infoMark[index]=currentMark; count++;  }
    void putEndmark(int index)
    {
        infoMark[index]='\0' ;
    }
    int getCount()  {return count;}
    char *getLine() {return infoLine; }
    char *getMark() {return infoMark; }

};


map mymap; // it must be string . otherwise you have to handle < operator overload in info object
map::iterator it;


it = mymap.find(ID);
if ( it !=mymap.end() ) // found id
        {
           it->second.insertmark( specNo-1,'-');
        }else // not found
        {
           mymap.insert( pair  ( ID,info(line2,specNo-1,'-')  )  );
        }



C/C++ map [primitive type]

void testMap()
{

    map mymap;
    map::iterator rit;

    mymap["x1"] = 100;
    mymap.insert(pair("y2",150));

    mymap["y2"] = 200;
    mymap["y2"] = 400; // this value will replace previous one.


    rit = mymap.find("xx");
    if ( rit !=mymap.end() ) // found id
    {
        rit->second= 11111; // change existing content
    }else // not found
    {

        mymap.insert( pair("xx",150)    );
    }
    // show content:
    for ( rit=mymap.begin() ; rit != mymap.end(); rit++ )
        cout << rit->first << " => " << rit->second << endl;

}

Tuesday, August 16, 2011

MATLAB som

% 1 4 5 7 -- sample 1
% 2 5 6 6 -- sample 2

% in cndx = you will get  the position of each training sample in cluster no
% as SOM takes each sample as a column we need to transpose feature before applying on SOM

%  -------  train -----------
trainFeature = [ 1 2  ; 4 5    ; 5 6 ; 7 6 ]; % Two training data with 4 feature each . So each column is a sample
net = newsom(  trainFeature'  ,[2 2] ); % 4 cluster
net.trainParam.epochs = 100;
[net,tr,Y,E,Pf,Af] = train(net, trainFeature'  );
distances = dist( trainFeature  ,net.IW{1}' );
[d,cndx_train ] = min(distances,[],2); % cndx gives the cluster index, d gives the distance
cndx_train % the position of each training sample in cluster no


bookKeep=zeros(SOMd1 * SOMd2 ,2);%one column for +ve,one column for -ve
    for c=1: totTrain
        index = cndx_train(c);
        if(c<=totPosTrain)
            bookKeep( index,1) = bookKeep( index,1) + 1; %+ve sample
        else
            bookKeep( index,2) = bookKeep( index,2) + 1; %-ve sample
        end
    end
    dlmwrite('clusterInfo',bookKeep,'\t');

% ---------  test  -------------

out = sim(net, featureInTest' );
% format: clusterNo*testSample : 0 0 0 1 0 0 0 0 % that means 4th sample in this custer;
% outFormat (clusterNo, serial )

     out = out'; % #testSample * clusterNo 
    dlmwrite('out',out,'\t');
    for c=1:totTest
      
        clusterNo(c) = find( out(c ,:) );     
        if c<=totPosTest
            origLabel(c) = 1;
        else
            origLabel(c) = -1;
        end
      
    end
    dlmwrite('predicted',[origLabel' clusterNo'],'\t');

MATLAB read random line if line size is fixed

NUM_EXAMPLE = 6;
NOOFTRAINING = 6;
NOOFCHARPERLINE = 10 ;
NOOFOFFSETPERLINE = NOOFCHARPERLINE+ 2; % sometimes it maybe 2 depending upon whow data was written into file


fid = fopen( 'D:\KAUST\2.Winter2010-11\SpliceSites\test.txt','r');


for i=1: NOOFTRAINING
   
   rowno = round(rand(1)*NUM_SAMPLE) ;
  
   offset = (rowno - 1 ) * NOOFOFFSETPERLINE;
   fseek(fid,offset,'bof');
   line = fgetl(fid);
   frewind(fid);
  
   disp(rowno);
%    disp(offset);
   disp(line);
  
end

fclose(fid);

MATLA entropy calculation

X = [ .5 .5];
lX = log2(X);
ent = -(X * lX' )

MATLAB read excel file

Basic Mode :If microsoft Excel is not installed
=============================

1. Excel file must be saved in win95 format. 97/98/2000 may not work.

2. Can not read range. Read whole file.

 

    %  or read the first sheet
   ---------------------------------------

    all = xlsread( 'filename' );


    %  or read the specified sheet
   ---------------------------------

all = xlsread( 'filename','sheetname','' , 'basic'  )

MATLAB neural network

2010a
================
net=newff(trainFeature,trainValue,[8 1],{'tansig' 'purelin'});

2009a
===============

net=newff(trainFeature,trainValue,[8 1]);
net.layers{1}.transferFcn = 'tansig';
net.layers{2}.transferFcn = 'purelin'


save net
==================
 save net;

%-------------------- Radial Basis Network No train -------------------

  net=newrb(trainFeature,trainValue,0.0, 1, 100, 1);
 save net;

NO TRAIN . IT WILL AUTOMATICALLY CRATE NEW NEURON


load net for testing
====================

load net;

MATLAB adding noise into data


add percent  % noise to each value
====================================
percent=.03;

sample = sampleMatrix(index, :);
noiseSample=[];
for i=1 :instanceSize
       val = sample(i);
       smallPerc = val * percent ; % 1 percent noise add
       noise = smallPerc* [ rand(1) - 0.5 ]; % .*T
       noisyVal = val + noise ; % small is very small 10e-9
       noiseSample = [noiseSample noisyVal];
end
noiseMatrix = [ noiseMatrix ; noiseSample];

Add little noise to X
======================

rand( 'state', 0 ) % any number for random number
randn('state', 0 ) % any number for random number
small = 10e-11
small* [ rand(size(X)) - .5 ] .* X + X  % small is very small 10e-9

index = ceil(a + (b-a).*rand(1) ); %

Reading file in C


void readLine()
{
    FILE *fpTest;
    fpTest = fopen ( "test.txt", "r" );
    int countRead = 0,countLine = 0,countCodeDataLine = 0,length = 0;
    if(fpTest ==NULL){
        printf("error in opening test.txt file" ); // perror
        exit(0);
    }
   
   
    while( !feof(fpTest)    )
    {
        fgets ( line, MAXATTRLINELEN, fpTest );
        countRead++;
        length = strlen(line);
        printf("[%d]:%s(lineLength=%d) \n",countLine,line,length);
       
        if(length > 0)
        {
            countLine++;
        }
       
        // count countCodeDataLine
        if(length == 0) ;
        else if(length ==1)
        {
            //printf(" %d  %d %d %d %d",line[0], '\n', '\r' , '\r\n', '\n\r');
            if(line[0]==' ' || line[0] == '\n'  || line[0] == '\t' ) ;
            else
                countCodeDataLine++;
        }else
        {
            countCodeDataLine++;   
        }
       
        line[0] = '\0';
       

    }
   
   
    printf(" Read %d . Line  %d . Dataline  %d\n",countRead, countLine,countCodeDataLine);

    fclose ( fpTest );
   
}

Dynamic array in C/C++


Two dim float malloc
====================

    float ** PC;

    PC = (float**) malloc(DIM_PC_X * sizeof(float*)); // [DIM_PC_X][DIM_PC_Y];
    for(int i = 0; i < DIM_PC_X; i++)
        PC[i] = (float*) malloc(DIM_PC_Y * sizeof(float));



Two dim float using new
============================

int sizeX=5,sizeY = 2;
   
int** ary = new int*[sizeX];
   
for(int i = 0; i < sizeX; ++i)
       
    ary[i] = new int[sizeY];

Running openmp in eclipse

As we know to run openmp in gcc , C++ project we have to compile it with

g++ -fopenmp option. To configure this with eclipse you just need to add -fopenmp under GCC C++ linker command option