Skip to main content

CAGE / RNA-seq normalizaton


Various normalization techniques in different tools

1. DESeq


Input a N by C matrix:
 N row ;  N genes
C column; Each column represents replica and different condition
Each cell : Represents the integer tag count value OR whatever value you want.

Algo:
1. From count matrix, for each row, calculate the geometric mean.
2. Diving each value by GM of corresponding row which generates size factor (SF) of each column.
3. Divide each count value by SF

MATLAB code

function doDESEQnorm

mymat=[   
    0    0    0    0    0    0    1;
    92    161    76    70    140    88    70;
    5    1    0    0    4    0    0;
    0    2    1    2    1    0    0 ];
   
% This shows four gene; in tow condition untreated and treated; First four replica for "untreated" and last %three column for "treated";


sizeRow = size(mymat,1);
sizeCol = size(mymat,2);

%% estimateSizeFactors %% 1 - Find Geometric Mean (GM) of Each row
gmRow = zeros(1,sizeRow);
for i=1:sizeRow
   
    curRow = mymat(i,:)';
    nzId = find(curRow);
    tmpVal = curRow(nzId);
    gm = geomean(tmpVal);
    gmRow(i) = gm;
   
end
mymat3= mymat;

%% estimateSizeFactors %% 2 - Divide each colum by corresponding row GM
afterDiv = mymat3;
for i=1:sizeRow
    afterDiv(i,:) = mymat3(i,:) / gmRow(i);
end



%% estimateSizeFactors %% 3 - SizeFactor: Take the median of Non-Zero Mormalized values
sizeFactor = zeros(1,sizeCol);
for col=1:sizeCol
   nzRow = find( afterDiv(:,col) );
   nzVal = afterDiv(nzRow,col);
   nzVal = sort(nzVal);
  
%    numberNZ = size(nzVal,1);
%    if rem(numberNZ, 2) ==0
%        idx = numberNZ/2 ;
%    else
%        idx = fix(numberNZ/ 2) +1;
%    end
%    sizeFactor(col)= nzVal(idx,1);

   sizeFactor(col) = median(nzVal);
  
end
sizeFactor

%% counts(sf, normalize=T) ; %% Do Normalzation
normValue = mymat;
for col=1:sizeCol
    normValue(:,col) = normValue(:,col)./sizeFactor(col);   
end

end

Comments

Popular posts from this blog

R tutorial

Install R in linux ============ In CRAN home page, the latest version is not available. So, in fedora, Open the terminal yum list R  --> To check the latest available version of r yum install R --> install R version yum update R --> update current version to latest one 0 find help ============ ?exact topic name (  i.e.   ?mean ) 0.0 INSTALL 3rd party package  ==================== install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package BED file parsing (Always use read.delim it is the best) library(MASS) #library(ggplot2) dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/" dirData="D:/research/F5shortRNA/TestRIKEN/" setwd(dirRoot) getwd() myBed="test.bed" fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed ## Read bed use read.delim - it is the  best mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = ...

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partiti...

MATLAB optimization toolbox usage with genetic algorithm

Useful tutorial http://www.mathworks.com/products/global-optimization/description3.html Best example of implementatoin with Constraint, objective function http://www.mathworks.com/help/gads/examples/constrained-minimization-using-the-genetic-algorithm.html More about how to use multi-objective http://www.mathworks.com/discovery/multiobjective-optimization.html http://www.mathworks.com/help/gads/examples/performing-a-multiobjective-optimization-using-the-genetic-algorithm.html http://www.mathworks.com/help/gads/examples/multiobjective-genetic-algorithm-options.html Example GAMULTOBJ (can handle Multiple Objective)  GA(can handle 1 objective) Constrained Minimization Problem We want to minimize a simple fitness function of two variables x1 and x2 min f(x) = 100 * (x1^2 - x2) ^2 + (1 - x1)^2; x min f(x) = 100 * (x1^2 + x2) ^2 + (1 + x1)^2; x such that the following two nonlinear constraints and bounds are satisfied x1*x2 + x1 - x2 + 1.5 <...