Skip to main content

MATLAB check unique string in file

function identifyDuplicate
clc;

uniqueSeq={};
dupSeq={};

index=1;
uniqueIndex=1;
dupIndex=1;
uniq=[];
dup=[];
isDuplicated = 0;
fid = fopen('1400M_from_287PS_287NS.ranked','r');


tline = fgetl(fid); % ******
 while ischar(tline)
    
     consensusSeq = fgetl(fid); % Consessus: AAACC
     consensusSeq = upper(consensusSeq);

     curSeq = sscanf(consensusSeq,'%*s %s', [1, inf]);
     curSeq = upper(curSeq);

     fgetl(fid); % Threshold
     fgetl(fid); % Coverage
     fgetl(fid); % p-value
     fgetl(fid); % r1
     fgetl(fid); % r2
     fgetl(fid); % r3
     fgetl(fid); % r4
    
     isExist=0;
    
     for en=1:uniqueIndex -1    
         exist = strcmp(curSeq,uniqueSeq{en})
         if exist ==1
            isDuplicated = 1;
             break;
         end
     end
      
     if( isDuplicated == 1 ) % already exist       
         dupSeq{dupIndex}  = {curSeq};      
         dupIndex = dupIndex + 1;
         dup = [dup;index];
     else % not found
        
         uniqueSeq{uniqueIndex} = {curSeq};  
         uniqueIndex = uniqueIndex + 1;
         uniq = [ uniq;index];
     end
      
   
     
     tline = fgetl(fid); % next ******
     index = index + 1;
     isDuplicated = 0;
    
    
 end


 dlmwrite('unique',uniq,'\t'); % index of unique entry
 dlmwrite('dup'   ,dup   ,'\t'); % index of duplicate entry


fclose(fid);


Comments

Popular posts from this blog

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partiti...

R tutorial

Install R in linux ============ In CRAN home page, the latest version is not available. So, in fedora, Open the terminal yum list R  --> To check the latest available version of r yum install R --> install R version yum update R --> update current version to latest one 0 find help ============ ?exact topic name (  i.e.   ?mean ) 0.0 INSTALL 3rd party package  ==================== install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package BED file parsing (Always use read.delim it is the best) library(MASS) #library(ggplot2) dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/" dirData="D:/research/F5shortRNA/TestRIKEN/" setwd(dirRoot) getwd() myBed="test.bed" fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed ## Read bed use read.delim - it is the  best mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = ...

SLURM tutorial : Basic commands

Main website for learning SLRUM http://slurm.schedmd.com/tutorials.html Submit a job with name and outputfile name(This will overwrite the parameters in shell file header ) sbatch   -J   job1  -o   job1.out  --partition=batch    myscript.sh   Basic shell script for job #!/bin/sh # #SBATCH --job-name=testJob #SBATCH --time=01:00:00 #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --partition=dragon-default # # Display all variables set by slurm env | grep "^SLURM" | sort # cd /projects/dragon/FANTOM5/processed_data_feature ## All my commands for job will go here date;time; mkdir t1 How to submit a batch job sbatch myscript.sh How to check the list of jobs of a user squeue -u user1 squeue -u user1 -l # it will show in details   How to check the whole history and status of a job   scontrol show job=JOBID   How to use one particular node in interactive mode. Useful when all...