Skip to main content

Posts

Showing posts from 2011

C/C++ map

map { string,info> mymap; map {string,info>::iterator it; / / insert into map it = mymap.find(chrmName); if ( it !=mymap.end() ) // found id{             it->second.fnc();  }else // not found{             mymap.insert(  pair ( chrmName, info()  )  ) ;  } // iterate over map cout << "map size:" << mymap.size() << endl;  map ::iterator it3;  int count=0;  int tot = 0;     for ( it3=mymap.begin() ; it3 != mymap.end(); it3++ ){         count++;         cout << it3->first.c_str() << "=>" ;         cout<< it3->second.getCount() << endl;         tot = tot+ it3->second.getCount();     }

C/C++ vector

void fnc() {      vector nameV;       nameV.push_back("ab");       nameV.push_back("cd");       int sz = nameV.size();       cout<< " Size of Vector:" << sz << endl;       for(int i=0;i       {           cout<<  nameV[i] << endl;       } }

JAVA file operation

 FileInputStream fstream = new FileInputStream("allUpper.seq");  DataInputStream in = new DataInputStream(fstream);   BufferedReader br = new BufferedReader(new InputStreamReader(in)); BufferedWriter out = new BufferedWriter(new FileWriter("kMer.txt")); // for write while ((seqLine = br.readLine()) != null) {                out.write(seqLine);                out.write("\n");   } br.close();  in.close();  fstream.close();   out.close();

linux sshfs mount remote folder in local disk

sshfs   alamt@kw2237.rc:/home/KAUST/alamt  /home/tanviralam/kaustmachine/ ssh -X  alamt@kw2237.rc run matlab /mnt/kaustapps/MATLAB-faculty/matlab.R2011b http://www.go2linux.org/sshfs-mount-remote-filesystem-using-ssh Installation of sshfs Get the packages For Debian: apt-get install fuse-utils sshfs For Ubuntu: sudo apt-get install fuse-utils sshfs For Fedora and Centos: yum install fuse-utils sshfs For Mandriva: urpmi: urpmi fuse-utils sshfs Next step is to mount the fuse module[localuser: tanviralam remoteuser: alamt] 1. modprobe fuse 2. Next create the mount point mkdir /home/tanviralam/remote chown [local-user]:[your-group] /mnt/remote-fs/ chown tanvialam:777 /home/tanviralam/remote 3. Add yourself to the fuse group adduser [local-user] fuse adduser tanviralam fuse 4. Untill here all the command should be issued as root, now switch to your users and mount the remote filesystem. sshfs remote-user@remote.server:/remote/direc

MATLAB cell array

Accessing cell array : {}-for getting array, () - for elem of this array ============================================= carray{i} carray{i}(elemIndex) example : access cell array and string compare ================================ for i=1:totSample % ID         for j=1:totRepeat                 a =( promoter_ID{1}(i) ) ;  % check the first column cell array         b = ( promoter_repeat{1}(j) ); % check the first column cell array                  if strcmp(a,b) % yes it overlaps with some repeat           display('yes')             else           display('no')         end           end end

linux connect via ssh on cluster

ssh -t dsa // it will promt for password and name of key file 1. suppose file name is mykey. It will generate two file mykey and mykey.pub 2. it will also generate hidden folder .ssh. There will be a file named known_hosts in that folder. 3. copy this two file into .ssh folder cd .ssh cp  ../mykey ../mykey.pub . 4. keep this folder with 3 files for future. 5. Once admin give u access to server, type ssh alamt@cluster.cbrc.kaust.edu.sa First time it will prompt password. But from next it will not. For cluster  1. cd user 2. cd DMF6 3. cd ../user/alamt 4. cp -rf DMF6 ../alamt/. 5. cp run_job_vlad.sh run_job_tanvir.sh 6.  chmod a+x run_job_tanvir.sh 7. ./run_job_tanvir.sh node25 CGI tanvir_01 8. scp Copy file from remote -->  local ====================== open terminal in local machine scp source destination > scp     alamt@cluster.cbrc.kaust.edu.sa:/home/alamt/DMF6/CGI.tanvir_01.results/*.rank     /home/tanviralam/tst scp Copy f

matlab normalization

Using meand stddev  function [featureIn,meanFeatIn, stdDevFeatIn] = mynorm_train(featureIn) meanFeatIn = mean(featureIn,1); stdDevFeatIn = std(featureIn,1,1); noSample = size(featureIn,1); for i=1:noSample     featureIn(i,:) = (featureIn(i,:) - meanFeatIn) ./ stdDevFeatIn ; end end  function [testFeatureIn] = mynorm_test(testFeatureIn,meanFeatIn,stdDevFeatIn)     noSample = size(testFeatureIn,1);     noInputFeat = size(testFeatureIn,2);     for i=1:noSample             testFeatureIn(i,1:noInputFeat) = (testFeatureIn(i,1:noInputFeat) - meanFeatIn ) ./ stdDevFeatIn;               end   end Using range  function [ N_feature,feature_range,feature_bases ] = normalize( features ) %NORMALIZE Summary of this function goes here %   Detailed explanation goes here % samples are in rows for NoF = 1:size(features,2)     F_min(NoF) = min(features(:,NoF));     F_max(NoF) = max(features(:,NoF));        feature_range(NoF) = (F_max(NoF)-F_min(NoF))/2;     feature_ba

matlab feature ranking

used function rankfeatures (consider sample as column) ==================================================== train = [trainFeature trainLabel]; [IDX ,Z] = rankfeatures( trainFeature' , trainLabel' ,' Criterion ', 'ttest' ); % ttest / entropy/ etc... topRankedFeature = (size( trainLabel ,1)) / 2 ; classify( testFeature ( :,IDX(1:topRankedFeature) ) ,   ...           trainFeature( :,IDX(1:topRankedFeature) ) , trainLabel , ...     ' diagquadratic' ) % liner/quadratic/ diagquadratic etc % transpose as it takes sample as column vector % ttest / entropy/ etc...   %IDX is the list of indices to the rows in X with the most significant features.   %Z is the absolute value of the criterion used (see below)

MATLAB check unique string in file

function identifyDuplicate clc; uniqueSeq={}; dupSeq={}; index=1; uniqueIndex=1; dupIndex=1; uniq=[]; dup=[]; isDuplicated = 0; fid = fopen('1400M_from_287PS_287NS.ranked','r'); tline = fgetl(fid); % ******  while ischar(tline)           consensusSeq = fgetl(fid); % Consessus: AAACC      consensusSeq = upper(consensusSeq);      curSeq = sscanf(consensusSeq,'%*s %s', [1, inf]);      curSeq = upper(curSeq);      fgetl(fid); % Threshold      fgetl(fid); % Coverage      fgetl(fid); % p-value      fgetl(fid); % r1      fgetl(fid); % r2      fgetl(fid); % r3      fgetl(fid); % r4           isExist=0;           for en=1:uniqueIndex -1              exist = strcmp(curSeq,uniqueSeq{en})          if exist ==1             isDuplicated = 1;              break;          end      end             if( isDuplicated == 1 ) % already exist                 dupSeq{dupIndex}  = {curSeq};                dupIndex = dupIndex + 1;          dup = [dup;index];      else % not found        

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partition     selectedKfoldSen=[];selectedKfoldSpe=[];selectedKfoldAcc=[];     indexSen=1;indexSpe=1;indexAcc=1;     if ( kfold == (P+N) )% leave one out         sensitivity = sum(cvtp) /( sum(cvtp) + sum(cvfn) )         specificity = sum(cvtn) /( sum(cvfp) + sum(cvtn) )         acc

MATLAB confusion matrix

%  test_class  & predicted_class must be same dimension % 'order' - describes the order of label. Here labels are 'g' as positive and 'h' as negative [C,order] = confusionmat( test_class(1: noSampleTest), predicted_class, 'order', ['g' ;'h'] ) tp = C(1,1); fn = C(1,2); fp = C(2,1); tn = C(2,2); sensitivity = tp /( tp + fn ) specificity = tn /( fp + tn ) accuracy = (tp+tn) / (tp+fn+fp+tn) tpr = sensitivity fpr = 1-specificity precision = tp /( tp + fp ) fVal = (2*tpr*precision)/(tpr+precision)

MATLAB string manipulation

string cell array ================ ar = {'aa';'bbbbb'} mat2str   (convert number to string) ====================== parameter = '-s 0 -t 0 -c '; for c=1:2     nextp = [parameter mat2str(  c) ] end string comparison ================  sa='ab';  sb='aba';  t = strcmp(sa , sb ) if( t == 1)   display('Yest match'); else    display('No match') end strmatch  find exact string in cell array ======================= list = {'max', 'minimax', 'maximum', 'max'} x = strmatch('max',list,'exact') find  string starts with in cell array ======================= list = {'max', 'minimax', 'maximum', 'max'} x = strmatch('max',list) String Comparison ================  strmatch('ab' , 'abc') ; % return 1 strcmp('ab' , 'abc') ; % return0 Find index of 

MATLAB normalize train and test

matrixTrain = load('matrix.train');  function [matrixTrain , meanFeatIn, stdDevFeatIn] = mynorm_train(matrixTrain) featureIn = matrixTrain(:,1:end-1); featureOut = matrixTrain(:,end); meanFeatIn = mean(featureIn,1); stdDevFeatIn = std(featureIn,1,1); meanFeatOut = mean(featureOut,1); stdDevFeatOut = std(featureOut,1,1) ;  dlmwrite('normInfo',[meanFeatOut stdDevFeatOut],'delimiter','\t'); noSample = size(featureIn,1);  for i=1:noSample             featureIn(i,:) = (featureIn(i,:) - meanFeatIn) ./ stdDevFeatIn ;             featureOut(i,:) = (featureOut(i,:) - meanFeatOut) ./ stdDevFeatOut ;  end matrixTrain = [ featureIn featureOut]; end matrixTest = load('matrix.test'); function [matrixTest] = mynorm_train(matrixTest,meanFeatIn, stdDevFeatIn,meanFeatOut ,stdDevFeatOut ) noSample = size(matrixTest,1); noInputFeat = size(matrixTest,2) - 1; for i=1:noSample             matrixTest(i,1:noInputFeat) = (matrixTest(i,1:noI

matlab matrix to weka .arff format conversion

inputFormat ( matrix are tab seperated , last column indicates the label, This is for two class problem, for multiclass need to change the code in few lines) ====================================================================== 5.0  6.5  7.9 +1 6.6  8.9  6.1 -1 code ======= function matlabToarff % convert matrix int arff(Attribute relation file format )format clc; fNameData = 'seqLabel'; fNameARFF = 'seqLabel.arff'; fidARFF = fopen( fNameARFF ,'w'); matrix = load(fNameData); feature = matrix ( : , 1:end-1); label = matrix (: , end) ; noFeature = size(feature,2); noSample = size(feature,1); %%%%%%%%%% header fprintf(fidARFF,'%s\n\n','@RELATION LNCRNAsequence'); for i=1:noFeature % noFeature          fprintf(fidARFF,'%s\t%d\t%s\n' ,'@ATTRIBUTE' , i, 'NUMERIC' ); end fprintf(fidARFF,'%s\n\n','@ATTRIBUTE class {+1,-1 }'); %%%%%%%%%%  data fprintf(fidARFF,'%s\n','@DA

matlab matrix to svm format conversion

inputFormat ============ 28.7967,16.0021,2.6449,0.3918,0.1982,27.7004,22.011,-8.2027,40.092,81.8828,g svmFormat ============= g    1:28.796700    2:16.002100    3:2.644900    4:0.391800    5:0.198200    6:27.700400    7:22.011000    8:-8.202700    9:40.092000    10:81.882800 code ======= fid = fopen('magic04.data','r'); raw_data = textscan(fid,'%f %f %f %f %f %f %f %f %f %f  %c','delimiter',','); data = [raw_data{1:10}]; class = raw_data{11}; fclose(fid); svmTrain = fopen( 'magic04.data.svm' ,'w'); noRow = size(data,1); noCol = size(data,2); for r=1:noRow     label = class(r);     fprintf(svmTrain,'%s\t',label );     for c=1:noCol             fprintf(svmTrain,'%d:%f\t',c,data(r,c) );     end     fprintf(svmTrain,'\n' ); end fclose(svmTrain);

MATLAB file operation

 ============== TEXTSCAN ================= fid = fopen('magic04.data','r'); raw_data = textscan(fd,'%f %f %f %f %f %f %f %f %f %f  %c','delimiter',','); data = [raw_data{1:10}]; class = raw_data{11}; fclose(fid); ============== FGETL , FPRINTF ================= fid = fopen('positive.seq','r'); fidTrain = fopen('randomTrain.seq','w');  tline = fgetl(fid);  while ischar(tline)      disp(tline);       fprintf(fidTrain ,'%s\n',upper(tline));      tline = fgetl(fid);    end fclose(fidTrain ); fclose(fid);

MATLAB decision tree classregtree both classification and regresstion

matrixTrain = load('primate.train' ); featureInTrain = matrixTrain( :, 1:end-1); featureOutTrain = matrixTrain(:,end); matrixTest = load('primate.test' ); featureInTest = matrixTest( :, 1:end-1); featureOutTest = matrixTest(:,end); % tree % t = classregtree(featureInTrain,featureOutTrain,'method','classification'); % predictedOut =str2double( eval(t,featureInTest)) %tree bagger bnew = TreeBagger(10 ,featureInTrain , featureOutTrain, 'Method','classification') % for 10 tree predictedOut = predict(bnew, featureInTest) predictedOut = str2double(predictedOut) t = bnew.Trees{1,1} t =bnew.Trees{1,2} t =bnew.Trees{1,3} ... ... ... t =bnew.Trees{1,10}

MATLAB ANN artificial neural network train test

sample1: 1 2 3 4  label: A sample2: 1 5 7 7  label: B   Every sample must be put in a column ============================= featureIn 1 1 2 5 3 7 4 7 featureOut A B function [yPredict] = doBP(trainFeature,trainValue) trainFeature = trainFeature'; % to fit matlab format trainValue = trainValue';% to fit matlab format % % version 2010a  net=newff(trainFeature,trainValue,[13 1],{'tansig' 'purelin'}); % tansig purelin % version 2009a % net=newff(trainFeature,trainValue,[13 1]); % net.layers{1}.transferFcn = 'tansig'; % net.layers{2}.transferFcn = 'purelin'; net=init(net); net.trainParam.epochs = 99999999; net.trainParam.goal = 0.0000001; %(stop training if the error goal hit) net.trainParam.lr= 0.000001; % (learning rate, not default trainlm) [0.01] net.trainParam.epochs = 99999999; net.trainParam.goal = 0.0000001; %(stop training if the error goal hit) net.trainParam.lr= 0.000001; % (learning rate, not default t

C/C++ strtok string tokenizer

         char delims[] = "\t \n";     char *result = NULL; // always hold the token serially     result = strtok( curline, delims ); // get first token     count = 0;     while( result != NULL ) {           count++;         switch(count){             case 1:                strcpy(chromosomename, result);                break;             case 2:                strcpy(sStart,result);                startIndex=atoi(result);                break;             default:                fprintf(fppromotor,"%s\t",result);                break;         }          result = strtok( NULL, delims ); // get next token     }

MATLAB load textscan or save a matrix

load ============ matrix = load('engine.train'); textscan ========================== fd = fopen('magic04.data','r'); raw_data = textscan(fd,'%f %f %f %f %f %f %f %f %f %f  %c','delimiter',','); data = [raw_data{1:10}]; class = raw_data{11}; saving ============= dlmwrite('engine.train',[trainFeatureIn trainFeatureOut] , '\t');

C/C++ map [Object type]

class info {     char infoLine[MAXLINELEN];     char infoMark[NOSPECIES];     int count; public:     //info(){  strcpy(infoLine,""); strcpy(infoMark,""); for(int i=0;i     // for new  entry     info(char* li, int index, char mr){         strcpy(infoLine,li);         //cout << infoLine << endl;         for(int i=0;i             infoMark[i]='+' ;         infoMark[index]=mr;         count = 1;     }     // for existing entry     void insertmark(int index, char currentMark) { infoMark[index]=currentMark; count++;  }     void putEndmark(int index)     {         infoMark[index]='\0' ;     }     int getCount()  {return count;}     char *getLine() {return infoLine; }     char *getMark() {return infoMark; } }; map mymap; // it must be string . otherwise you have to handle < operator overload in info object map ::iterator it; it = mymap.find(ID); if ( it !=mymap.end() ) // found id         {            it-

C/C++ map [primitive type]

void testMap() {     map mymap;     map ::iterator rit;     mymap["x1"] = 100;     mymap.insert(pair ("y2",150));     mymap["y2"] = 200;     mymap["y2"] = 400; // this value will replace previous one.     rit = mymap.find("xx");     if ( rit !=mymap.end() ) // found id     {         rit->second= 11111; // change existing content     }else // not found     {         mymap.insert( pair ("xx",150)    );     }     // show content:     for ( rit=mymap.begin() ; rit != mymap.end(); rit++ )         cout << rit->first << " => " << rit->second << endl; }

MATLAB som

% 1 4 5 7 -- sample 1 % 2 5 6 6 -- sample 2 % in cndx = you will get  the position of each training sample in cluster no % as SOM takes each sample as a column we need to transpose feature before applying on SOM %  -------  train ----------- trainFeature = [ 1 2  ; 4 5    ; 5 6 ; 7 6 ]; % Two training data with 4 feature each . So each column is a sample net = newsom(  trainFeature'  ,[2 2] ); % 4 cluster net.trainParam.epochs = 100; [net,tr,Y,E,Pf,Af] = train(net, trainFeature'  ); distances = dist( trainFeature  ,net.IW{1}' ); [d,cndx_train ] = min(distances,[],2); % cndx gives the cluster index, d gives the distance cndx_train % the position of each training sample in cluster no bookKeep=zeros(SOMd1 * SOMd2 ,2);%one column for +ve,one column for -ve     for c=1: totTrain         index = cndx_train(c);         if(c<=totPosTrain)             bookKeep( index,1) = bookKeep( index,1) + 1; %+ve sample         else             bookKeep( index,2) = boo

MATLAB read random line if line size is fixed

NUM_EXAMPLE = 6; NOOFTRAINING = 6; NOOFCHARPERLINE = 10 ; NOOFOFFSETPERLINE = NOOFCHARPERLINE+ 2; % sometimes it maybe 2 depending upon whow data was written into file fid = fopen( 'D:\KAUST\2.Winter2010-11\SpliceSites\test.txt','r'); for i=1: NOOFTRAINING        rowno = round(rand(1)*NUM_SAMPLE) ;       offset = (rowno - 1 ) * NOOFOFFSETPERLINE;    fseek(fid,offset,'bof');    line = fgetl(fid);    frewind(fid);       disp(rowno); %    disp(offset);    disp(line);    end fclose(fid);

MATLAB read excel file

Basic Mode :If microsoft Excel is not installed ============================= 1. Excel file must be saved in win95 format. 97/98/2000 may not work. 2. Can not read range. Read whole file.       %  or read the first sheet    ---------------------------------------     all = xlsread( 'filename' );     %  or read the specified sheet    --------------------------------- all = xlsread( 'filename','sheetname','' , 'basic'  )

MATLAB neural network

2010a ================ net=newff(trainFeature,trainValue,[8 1],{'tansig' 'purelin'}); 2009a =============== net=newff(trainFeature,trainValue,[8 1]); net.layers{1}.transferFcn = 'tansig'; net.layers{2}.transferFcn = 'purelin' save net ==================  save net; %-------------------- Radial Basis Network No train -------------------   net=newrb(trainFeature,trainValue,0.0, 1, 100, 1);  save net; NO TRAIN . IT WILL AUTOMATICALLY CRATE NEW NEURON load net for testing ==================== load net;

MATLAB adding noise into data

add percent  % noise to each value ==================================== percent=.03; sample = sampleMatrix(index, :); noiseSample=[]; for i=1 :instanceSize        val = sample(i);        smallPerc = val * percent ; % 1 percent noise add        noise = smallPerc* [ rand(1) - 0.5 ]; % .*T        noisyVal = val + noise ; % small is very small 10e-9        noiseSample = [noiseSample noisyVal]; end noiseMatrix = [ noiseMatrix ; noiseSample]; Add little noise to X ====================== X  rand( 'state', 0 ) % any number for random number randn('state', 0 ) % any number for random number small = 10e-11 small* [ rand(size(X)) - .5 ] .* X + X  % small is very small 10e-9 index = ceil(a + (b-a).*rand(1) ); %

Reading file in C

void readLine() {     FILE *fpTest;     fpTest = fopen ( "test.txt", "r" );     int countRead = 0,countLine = 0,countCodeDataLine = 0,length = 0;     if(fpTest ==NULL){         printf("error in opening test.txt file" ); // perror         exit(0);     }             while( !feof(fpTest)    )     {         fgets ( line, MAXATTRLINELEN, fpTest );         countRead++;         length = strlen(line);         printf("[%d]:%s(lineLength=%d) \n",countLine,line,length);                 if(length > 0)         {             countLine++;         }                 // count countCodeDataLine         if(length == 0) ;         else if(length ==1)         {             //printf(" %d  %d %d %d %d",line[0], '\n', '\r' , '\r\n', '\n\r');             if(line[0]==' ' || line[0] == '\n'  || line[0] == '\t' ) ;             else                 countCodeDataLine++;        

Dynamic array in C/C++

Two dim float malloc ====================     float ** PC;     PC = (float**) malloc(DIM_PC_X * sizeof(float*)); // [DIM_PC_X][DIM_PC_Y];     for(int i = 0; i < DIM_PC_X; i++)         PC[i] = (float*) malloc(DIM_PC_Y * sizeof(float)); Two dim float using new ============================ int sizeX=5,sizeY = 2;     int** ary = new int*[sizeX];     for(int i = 0; i < sizeX; ++i)             ary[i] = new int[sizeY];

Running openmp in eclipse

As we know to run openmp in gcc , C++ project we have to compile it with g++ -fopenmp option. To configure this with eclipse you just need to add -fopenmp under GCC C++ linker command option