Various normalization techniques in different tools
1. DESeq
Input a N by C matrix:
N row ; N genes
C column; Each column represents replica and different condition
Each cell : Represents the integer tag count value OR whatever value you want.
Algo:
1. From count matrix, for each row, calculate the geometric mean.
2. Diving each value by GM of corresponding row which generates size factor (SF) of each column.
3. Divide each count value by SF
MATLAB code
function doDESEQnorm
mymat=[
0 0 0 0 0 0 1;
92 161 76 70 140 88 70;
5 1 0 0 4 0 0;
0 2 1 2 1 0 0 ];
% This shows four gene; in tow condition untreated and treated; First four replica for "untreated" and last %three column for "treated";
sizeRow = size(mymat,1);
sizeCol = size(mymat,2);
%% estimateSizeFactors %% 1 - Find Geometric Mean (GM) of Each row
gmRow = zeros(1,sizeRow);
for i=1:sizeRow
curRow = mymat(i,:)';
nzId = find(curRow);
tmpVal = curRow(nzId);
gm = geomean(tmpVal);
gmRow(i) = gm;
end
mymat3= mymat;
%% estimateSizeFactors %% 2 - Divide each colum by corresponding row GM
afterDiv = mymat3;
for i=1:sizeRow
afterDiv(i,:) = mymat3(i,:) / gmRow(i);
end
%% estimateSizeFactors %% 3 - SizeFactor: Take the median of Non-Zero Mormalized values
sizeFactor = zeros(1,sizeCol);
for col=1:sizeCol
nzRow = find( afterDiv(:,col) );
nzVal = afterDiv(nzRow,col);
nzVal = sort(nzVal);
% numberNZ = size(nzVal,1);
% if rem(numberNZ, 2) ==0
% idx = numberNZ/2 ;
% else
% idx = fix(numberNZ/ 2) +1;
% end
% sizeFactor(col)= nzVal(idx,1);
sizeFactor(col) = median(nzVal);
end
sizeFactor
%% counts(sf, normalize=T) ; %% Do Normalzation
normValue = mymat;
for col=1:sizeCol
normValue(:,col) = normValue(:,col)./sizeFactor(col);
end
end
Comments
Post a Comment