Skip to main content

Posts

Showing posts from September, 2014

JAVA pattern match pattern search split example

import java.util.regex.Matcher; import java.util.regex.Pattern; // Check if input match a pattern Pattern curPat = Pattern.compile( "tanvir"+ "(.)+" + "tanvir"); Matcher mymatcher; String input="tanvirAAAAAAAtanvir"; mymatcher = curPat.matcher( input ); if(mymatcher.find()) {         System.out.println("Matched pattern: "+ input );                     } // List all hit with position in input string m=  ConstantValue.patReplicaFantom.matcher(colNameUnReadable) ;  while (m.find()) {                         System.out.print("Start index: " + m.start());                         System.out.print(" End index: " + m.end());                         System.out.println(" Found: " + m.group());  } // Split a input according to pattern Pattern patFastaHeade = Pattern.compile("[>_]+");  String text=">aaaa"; String tmp[]; tmp = patFastaHeade.split(text); for( int i=

CAGE / RNA-seq normalizaton

Various normalization techniques in different tools 1. DESeq Input a N by C matrix:  N row ;  N genes C column; Each column represents replica and different condition Each cell : Represents the integer tag count value OR whatever value you want. Algo: 1. From count matrix, for each row, calculate the geometric mean. 2. Diving each value by GM of corresponding row which generates size factor (SF) of each column. 3. Divide each count value by SF MATLAB code function doDESEQnorm mymat=[        0    0    0    0    0    0    1;     92    161    76    70    140    88    70;     5    1    0    0    4    0    0;     0    2    1    2    1    0    0 ];     % This shows four gene; in tow condition untreated and treated; First four replica for "untreated" and last %three column for "treated"; sizeRow = size(mymat,1); sizeCol = size(mymat,2); %% estimateSizeFactors %% 1 - Find Geometric Mean (GM) of Each row gmRow = zeros(1,sizeRow); for i=1:si