Monday, November 26, 2012

Matlab plot figure modifying tips pixed dpi relationship


Matlab  plot figure all basic operation
===========================

x = [1:5];
y = [7 8 1 9  10]

h =figure

plot(x,y)
hold on
plot(x, y+5);

% change axis properties

set(gca,'box','off'); % remove box around figure
set(gca,'FontSize',fontsize_tick-1); % set tick size in figure
set(gca,'LooseInset',get(gca,'TightInset'));  % if you want to remove space on right of figure
xlabel(commonLabelWindow,'fontsize' , fontsize_label);
ylabel('Average AT-skew','fontsize' , fontsize_label_special);
xlim ( [ min(myX)   max(myX)])
ylim ( [ min( [myY1  myY2  ])   max([ myY1  myY2  ])  ])



% change size of figure

http://www.mathworks.com/help/matlab/creating_plots/positioning-figures.html




%  set size of figure's "drawing" area on screen ( 5.5 cm  width ) by ( 4.0 cm height)

set(gcf, 'Units','centimeters', 'Position',[0 0 5.5 4.0])


% change resolution 600 dpi

http://www.mathworks.com/help/matlab/ref/print.html

set(gcf,'PaperPositionMode','auto'); % you MUST add this to make sure output wil'b same as screen
print('-depsc','skewAT.eps' , '-r600');
print('-dtiff','skewAT600.tif' , '-r600');


% save as file using default resolution

 saveas(h,'skewAT.eps','epsc2');
 saveas(h,'skewAT.fig','fig');
 saveas(h,'skewAT.png','png');



% close  figure

close(h);

end


% using transparent patch

[ciC_withRpt,bootstat] = bootci(noBootstrap, {myfnc,matCoding});
mypatchY = [ciC_withRpt(1,:)  fliplr(ciC_withRpt(2,:))]; 

% flip to make joining patch

mypatchX = [ myX fliplr(myX) ]; % flip to make joining patch
 

mypatch = patch(mypatchX , mypatchY,color_C_withRpt_patch,'edgecolor','none', 'FaceAlpha' , alphaVal );


% using transparent patch may overlap with axis;  border of figure disappear
 solution: shift camera little bit

ct = camtarget;
camtarget([ct(1)+0.001*ct(1) ct(2)+0.001 ct(3)]);





 http://www.mathworks.com/support/solutions/en/data/1-5TS5P4/index.html?product=ML&solution=1-5TS5P4

Solution:

This is an issue with the 'OpenGL' renderer installed in the system. When using transparency in a plot the figures renderer is automatically set to OpenGL. This renderer often clips the axes box and tick marks.

To work around this issue, you can make the axes box visible by slightly changing the camera target of the axes:
ct = camtarget;
camtarget([ct(1)+0.001*ct(1) ct(2)+0.001 ct(3)]);
To make the tick marks at the right side and top visible you can overlay another axes object with the same settings as the other axes, empty its ticklabels and set the axis locations to the right and top of the axes:
ax2 = copyobj(gca,gcf);
set(ax2,'XAxisLocation','top','XTickLabel','','YAxisLocation','right','YTickLabel','','Color','none')




% Determine image width by  Resolution at paper size

Very good tutorial :
http://auctionrepair.com/pixels.html

Input:
W = width of a printing area of a journal page (e.g. A4 W = 17.8 cm)
H = height of a printing area of a journal page (e.g. A4 H = 24 cm)
D = dots per inch (the required resolution )

Output:
Picture pixel and picture width

Calculation: 

given,

W = 17cm;
D= 600 dpi;

calculate: 

// change only width
Win = W/2.54;
imW = floor(Win*D);
n = 3; // No of figure per row
imW = floor(imW/n)*n;
imWi = imW/n;
im1 = imresize (im1Big, [NaN imWi]);

// change both width, height 
H = 24;
Hin = H/2.54;
imH = floor(Hin*d);
nV=5;
imH = floor(imH/nv)*nv;
imHi = imH/nv;
im1 = imresize(im1Big, [imHi imWi]);


Merging N figure in  rows( Our W=17.8 cm width of printable area for a journal page , n = 3 number of image per row)

function mymergeTanvir

im1Big = im2double (imread('skewCG_CPS.tif'));
im1 = imresize(im1Big, [NaN 1338]);
size(im1);
clearvars im1Big;

im2Big = im2double (imread('skewAT_CPS.tif'));
im2 = imresize(im2Big, [NaN 1338]);
size(im2)
clearvars im2Big;

im3Big = im2double (imread('skewCG_CPS.tif'));
im3 = imresize(im3Big, [NaN 1338]);
size(im3);
clearvars im3Big;


% [  im1 im2 im3;  im4 im5 im6]; % if you want 6 figures in 2 row
im = [im1 im2 im3 ]; % if you want 3 figures in 1 row

size(im);
imshow(im);
imwrite(im, 'test2.tif' ); % create the combined image

print -dtiff -r600 test.tif ; % export to disk at desired resolution
print -dpdf -r600 test.pdf ; % export to disk at desired resolution

% set(gcf, 'paperunits', 'centimeters', 'paperposition', [0 0 15 15]);
% the number after 'm' is magnification - quality of the pdf
% export_fig('combinedIm.pdf', '-m2', gcf);

end

Thursday, November 15, 2012

R tutorial


Install R in linux
============
In CRAN home page, the latest version is not available. So, in fedora,
Open the terminal
  • yum list R  --> To check the latest available version of r
  • yum install R --> install R version
  • yum update R --> update current version to latest one

0 find help
============
?exact topic name (  i.e.   ?mean )


0.0 INSTALL 3rd party package
 ====================
install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package

BED file parsing (Always use read.delim it is the best)


library(MASS)
#library(ggplot2)

dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/"
dirData="D:/research/F5shortRNA/TestRIKEN/"
setwd(dirRoot)
getwd()

myBed="test.bed"
fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed




## Read bed use read.delim - it is the  best

mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = "\"", dec = ".", fill = TRUE, comment.char = "");

## PARSE BED column

noGene=dim(mybed)[1]
allChr=as.matrix(mybed[1])
allStart=as.matrix(mybed[2])
allEnd=as.matrix(mybed[3])
allWidth=allEnd - allStart

## Iterate over rows of bed

for(i in myStart: myEnd) {
 
  curChr = allChr[i]
  curStart = allStart[i]
  curWidth = allWidth[i]
 
  if(i%%1000 ==0){
    print("Now calculating for geneno: ");
    print (i)
  }
 
}

## Find the index of row matching value a particular column

matchIdx = which( allChr == 'chr19')

mydata=read.delim(fnmCSV, header = FALSE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "");
  mydataRiyadh = mydata[ mydata$V6=="Riyadh Province" , ]
  mydataMakkah = mydata[ mydata$V6=="Makkah Province" , ]
 

0.1 Set path in .libPath() 3rd party package
==============================
.libPaths() -- show the path R will search for library

to append a new path:

.libPaths( c( .libPaths(), "/NewPath") )

for example

.libPaths( c( .libPaths(), "/home/alamt/R/x86_64-unknown-linux-gnu-library/3.0/") )

CBRC cluster R .libPath() should be
========================

 [1] "/home/alamt/R/x86_64-pc-linux-gnu-library/2.15"  
[2] "/home/apps/R/3.0.0/lib64/R/library"              
[3] "/home/alamt/R/x86_64-unknown-linux-gnu-library/3.0"




0.1.1. Call R code from terminal , shell script
====================

in shell type:
Rscript myCode.R tanviralam 99

myCode.R
------------------
args  =commandArgs(TRUE)
myWorkDir = as.character(args[1])
myval = as.double(args[2])
print(myval)
setwd(myWorkDir)




0.2. Use the 3rd party package
====================

library(mvtnorm)   #  use external package

0.3. Complete example of reading a tab delimitted file and build feature and label factor, In R always use factor instead of matrix
===============================================

data = read.delim( "feature_2_111.feature" , header= FALSE, sep="\t");
mymat =  as.matrix(data);

totRow =-nrow (data)
totFeat = ncol (data)
print(totRow);
print(totFeat);

myFeatureFrame = data[ c(1:totRow), c(1:totFeat) ] ;
myLabelFrame = data[ c(1:totRow), c(totFeat+1) ] ;


0.1. Create DataFrame Manually

// Create a dataframe with two variable in two columns: ExprValue and ClusterNo    
val=as.matrix(valforplot)
cno = as.matrix(clusternoforplot)
dfForPlot = data.frame( clusterno=cno ,  Exprvalue=val)
colnames(dfForPlot)
# check the value of first variable
dfForPlot$ExprValue

1. Create Matrix Manually


B = matrix(   c(2, 4, 3, 1, 5, 7),   nrow=3,    ncol=2) ;

http://www.r-tutor.com/r-introduction/matrix/matrix-construction


1. Read Tab delimited file ( Read bed file )

mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = "\"", dec = ".", fill = TRUE, comment.char = "");

noGene=dim(mybed)[1]
allChr=as.matrix(mybed[1])
allStart=as.matrix(mybed[2])
allEnd=as.matrix(mybed[3])
allWidth=allEnd - allStart



resultFinal=NULL;
print("Calculation of PhastCons score started ...")
for(i in 1:noGene) {
 
  curChr = allChr[i]
  curStart = allStart[i]
  curWidth = allWidth[i]

  }
 
}



1. Read as frame and convert to matrix


data = read.delim( "test.dat" , header= FALSE, sep="\t")
myMat =  as.matrix(data)



// read as tab delimited file
mydata = read.delim( fnmData ,header=TRUE,  sep="\t")

 2. Write matrix into file

=====================

library(MASS)
write.matrix(mymat, file = "data.in", sep = "\t")


2.a Write data frame into file

==========================

results = data.frame();
uniqTF=unique(data$V9);

for (tf in uniqTF)
{
    tmp = subset( data, data$V9==tf);
    results = rbind( results , data.frame(tmp$V1,tmp$V2,tmp$V3,tmp$V4,tmp$V5,tmp$V6,tmp$V7,tmp$V8,tmp$V9 ) );

}


write.table(results, file = "track.tf.gff",col.names=F, row.names=F, quote=FALSE, sep = "\t");


How to remove double quote in writing ( use quote=FALSE)

write.table(results, file = "track.tf.gff",col.names=F, row.names=F, quote=FALSE, sep = "\t");



2.b. Write output image into file
==========================

par(mfrow=c(4,3))  // divide canvas into 4 by 3 cells

png(filename="./netsimQ05.png",   bg="white")  ;  # create device
plot(apres1a) ; # make your plot 
dev.off() # stop device


figname='GM12878_All.eps'
postscript(file=figname, onefile=FALSE, horizontal=FALSE , bg="white")dev.off()

3. Convert to matrix

=================

mymat = as.matrix(myframe)

4. dimension of matrix

===============
dim(mymat)
norow = dim (mymat)[1]  OR   nrow(mymat)
nocol  = dim(mymat) [2]   OR  ncol(mymat)



4. Common Matrix Operation
====================================
# Removing rows with all zeros
mat = sample(0:1, 49, TRUE)
dim(mat) = c(7, 7)
mat[c(1,4), ] = 0
ind = rowSums(mat) != 0
newMat = mat[ind,]

# Fill matrix by same value
a= replicate(times , value)
a = replicate(10, 9999) creates a structure with ten values, where each value equals to 9999.

# Find index of all positions matching a value

matchIdx = which( mymatrix == value)  # it returns the index of matrix entry whose value match your value

mydata=read.delim(fnmCSV, header = FALSE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "");
  mydataRiyadh = mydata[ mydata$V6=="Riyadh Province" , ]
  mydataMakkah = mydata[ mydata$V6=="Makkah Province" , ]
 


# Transpose of a matrix
mymatTrans = t(mymat)

# Subselect ROW based on Value in column

mat1common = mat2[ mat1$V1 %in%  ma2$V1, ]


# Sort Matrix based on a column

myMatrixSorted = myMatrix [order( myMatrix$V1),]


5. common statistics
===============

mean of columns :  
colMeans(mymat)

mean of allvalue : 
mean(mymat)

variance : 

var(data)

range : 
range(data[ , 1])  # range of first column

Speakman/Pearson correlation coefficient of rows of matrix:

library(Hmisc)
mymat= as.matrix(mtcars) # here mtcars is the data frame having 32 row, 11 column

mymatTrans=t(mymat) # As these calculate correlation between columns


## cor:  measure each pair column cor-relation between 2 matrices
resultCorr = cor( mymatTrans1 , mymatTrans2 , method="spearman")
resultCorr[is.na(resultCorr)] =  0.0
finalResult = diag(resultCorr) # select each entry of diagonal(i,i)  as result


## rcorr: measure all pair column cor-relation
resultCorr = rcorr( mymatTrans , type="spearman")

resultCorr$r # show the correlation
resultCorr$P  # shows the p-value of cor-relation



dlmwrite ("mycoeff.txt" , resTrans$r )

P-value correction ; P-value adjustment for multiple adjustment
 http://www.r-bloggers.com/example-9-30-addressing-multiple-comparisons/
 https://stat.ethz.ch/R-manual/R-devel/library/stats/html/p.adjust.html

p.adjust(p, method = p.adjust.methods, n = length(p))
p.adjust.methods
# c("holm", "hochberg", "hommel", "bonferroni", "BH", "BY",   "fdr", "none")
Example
pvals = c(.001, .001, .001, .02, .22, .59, .87)
BONF = p.adjust(pvals, "bonferroni")
BH = p.adjust(pvals, "BH")
res = cbind(pvals, BH=round(BH, 3), BONF=round(BONF, 3))

6 sorting   and ordering to get sorted index:
================================
sortedVal  = sort ( data )
sortedVal  = rev  ( sort (data )  )

sortedId = order ( data , decreasing=TRUE )
sortedId  = order ( data ,  decreasing= FALSE )




6. Create vector of object provided into list : use c to create that vector
====================================
mynames = c ( 'ta' , 'al')
myvect =  c ( 5 , 3 ,89)
myvect = c ( myvect , 90)   # concate new element

insert element into vector

 myVect=c();
 
  for(i in 1: N)
  {
       
    cur =AllValue[i]
    if(i==1){
        myVect= c(cur)
    }else
    {
        myVect= append(myVect, cur)
    }
   
  }


7 Query
===================

find most frequent word and its index in array
---------------------------
mostFreq =  myarr="myarr" table="table" which.max="which.max"
mostFreqIndex =
myarr="=mostFreq,arr.ind=TRUE)"

8. Function calling
=========================

callee.r
--------------
myfnc1 <- function ( fname, totA ) {
    print(  paste("inside callee:" , fname) );
    print(  paste("inside callee:" , totA));
    return ( totA) # You must use (  ) to return value
}

caller.r
------------------
setwd('.')
source('callee.r');
y  <-myfnc1("tanvir", 5 );

call from command prompt :
-------------------------------------------
/usr/bin/Rscript caller.r  myval1  myval2


Include functions from another file
======================
source('filename.r');


pass command line argument to rscript call
===============================
args  < commandArgs (TRUE)
print(args[1])
print(args[2])



9. String Operation
==============



String Array
=========
cellName=c("HM1","HM2" );
print(cellName[i] );

String concat
===========

a="hell"
b="fire"
c=paste( a, b,sep = "")  

String Length
=============
word=”apple”
nchar(word)
Substring
=======
substr(word,start=2,stop=5)

10. loop
===========

cellName=c("HM1","HM2" );
 for(i in 1:length(cellName)) {
    print(cellName[i] );
    filePath=paste(myDir , '/data/',cellName[i] , sep="");
    print(filePath)
}
11. If - else if - else (DON'T PUT SPACE BEFORE else)
==========================================
val=6
if(val>10)
{
    print("it is above 10");
}else if (val >5)
{
    print("it is above 5");
}else
{
    print("it is default");
}

Using Hash / Hashmap / Key Value pair in R [Values are not inserted serially]




use library
=======
 library(hash)

# create my hash
=============
myHashCagt <- br="" hash="">


# insert new values in to hash
====================
  .set( myHashCagt, keys=curKey, values=curCluster )

# Find the expression values  of those  key
================================
valExpr = values(myHashCagt, keys=curKey)
valAsMat = as.matrix(valExpr) # N row, 1 column

12. Fisher 's exact test : package : stats
======================
myContin = matrix(c(1,4,7,4), nrow = 2);
fisher.test(myContin ,alternative = "two.sided" ) ; // two sided

fisher.test(myContin ,alternative = "greater" ) ; // right-hand sided
fisher.test(myContin ,alternative = "less" ) ; // left-hand sided

13. GRAPH PLOT
==============
http://rfunction.com/code/1203/cex-guide.pdf

Best tutorial: with details example 

http://www.programmingr.com/content/controlling-margins-and-axes-oma-and-mgp/

http://www.stat.auckland.ac.nz/~paul/Talks/Rgraphics.pdf

4 BASIC PARAMETER
par(cex.axis=1, cex.lab=2, cex.main=1.2, cex.sub=1)

 cex.axis= fix font size of axis tick
 cex.lab= fix font size size of axis label
 cex.main= fix size of main header font size
 cex.sub= fix size of sub header font size

http://www.r-bloggers.com/setting-graph-margins-in-r-using-the-par-function-and-lots-of-cow-milk/

par(mar=c(5.1,4.1,4.1,2.1)  , mgp(3,1,0) )

# mar ( bottomMargin, leftMargin, topMargin, rightMargin)
# mgp ( distanceBorderTo_AxisLabel, distanceBordeTo_tickMark, distanceBordeTo_tick)

http://rfunction.com/archives/1302

#Blank space line before drawing the box/axisLine of figure. It will contain label, tick etc.
mar=c(5.1,4.1,4.1,2.1)

mgp : Distance from the axis line to( the axis title, axis labels and axis line). default value of mgp is c(3,1,0), which means that the
axis title: def 3: is drawn in the fourth line of the margin starting from the plot region,  
tick labels: def 1: are drawn in the second line
ticks: def 0: itself is the first line.

Sunday, October 21, 2012

Java run with external jar



1. When classpath is used, jar command is ignored . So use "classpath" with class file and "cp" with jar file

2. NEVER FORGET TO PUT : even you use only 1 external jar. In some cases you will get compilation error if you don't put :

java  -classpath  /media/Data/eclipseWorkspace/myExternalJar/commons-math-1.2.jar:  com.cbrc.motifmap.CombinationOfTF

java  -cp /media/Data/eclipseWorkspace/myExternalJar/commons-math-1.2.jar:    -jar myJar.jar

java comparator sorted linked hash map treemap


////////////  inner class /////////////////


    class ValueComparator implements Comparator {

        LinkedHashMap  base;
        public ValueComparator(LinkedHashMap hMap) {
            this.base = hMap;
        }

        public int compare(Object a, Object b) {

            Myclass obj1 = (Myclass) base.get(a);
            Myclass obj2 = (Myclass) base.get(b);

            if( obj1.getRankDistance() > obj2.getRankDistance()    ) {
                return -1;
            }  else {
                return 1;
            }
        }

    }

    class Myclass
    {
        double rankDistance;
        DMFmotif  motif;
        public Myclass( double rankDistance, DMFmotif motif) {
            super();
            this.rankDistance = rankDistance;
            this.motif = motif;
        }
        public double getRankDistance() {
            return rankDistance;
        }
        public void setRankDistance(double rankDistance) {
            this.rankDistance = rankDistance;
        }
        public DMFmotif getMotif() {
            return motif;
        }
        public void setMotif(DMFmotif motif) {
            this.motif = motif;
        }


    }


//////////////////////////////////////////////// HASH MAP /////////////


LinkedHashMap lhMap = new LinkedHashMap();
ValueComparator vcomp =  new ValueComparator(lhMap);
TreeMap sorted_map = new TreeMap(vcomp);

        //         1. First insert valus into orig hashmap loop ....

        for(int i=0 ; i< vecDMFmotif.size() ; i++) {
            DMFmotif cur = vecDMFmotif.get(i);
            lhMap.put( cur.getId() ,    new Myclass(cur.getRankValue() ,cur )  ) ;


        }


        //         2.  // NOW SORTING IS DONE
        for (String key:lhMap.keySet()) {
            sorted_map.put(key, lhMap.get(key));
        }

        //         3. Write sorted values       
        Myclass inner;
        Set set = sorted_map.entrySet();
        Iterator i = set.iterator();
        while(i.hasNext()) {
            Map.Entry me = (Map.Entry)i.next();
            inner =  (Myclass)me.getValue()  ;

            System.out.println( inner.getRankDistance() + "\t"  +           inner.getMotif().getId() );             
        }    



Thursday, October 11, 2012

LATEX basic command


Float
  • 3 float Table, Figure, Picture
  • Caption can be given only to figure


tabular

  is just an array. Not float


\begin{table}
\begin{center}
\begin{tabular} { |r | r | r | c | } \hline
\multicolumn{4}{c}{this is header of table} \\ \hline
1 & 2 & 3 & 4 \\ \hline
\multicolumn{4}{c}{this is text} \\ \hline
\bf 4 & \it 5 & 6 & val \\ \hline \hline
\end{tabular}
\caption{mycaption}
\label{lbl:mylabel1}
\end{center}
\end{table}


Math environment

  • running text  $  write here $ 
  • centered & new line \[  ]\ 

\[
\left(
\begin{array}{cc}
a & b
\end{array}
\right)
\]
\[  2^{2^{\cdot{^2}}}  \]

Bibtex

.....
at end of document write
\bibliography{mybib.bib}

to cite
\cite{label1}
\cite{lab1}
\bibliography{1}
\bibliographystyle{plain} % alpha / unsrt / siam / ieeetr
\end{document}


Thursday, October 4, 2012

LATEX beamer tutorial

% This text is proprietary.
% It's a part of presentation made by myself.
% It may not used commercial.
% The noncommercial use such as private and study is free
% Dec 2007
% Author: Sascha Frank
% University Freiburg
% www.informatik.uni-freiburg.de/~frank/
%
%
\documentclass{beamer}
\setbeamertemplate{navigation symbols}{}


\usetheme{Warsaw}

\beamersetuncovermixins{\opaqueness<1>{25}}{\opaqueness<2 -="-">{15}}
\begin{document}
\title{Beamer Class Warsaw}
\author{Sascha Frank}
\date{\today}


\begin{frame}
\titlepage
\end{frame}

\begin{frame}\frametitle{Table of contents}\tableofcontents
\end{frame}


\section{Section no.1}
\begin{frame}\frametitle{Title}
Each frame should have a title.
\end{frame}
\subsection{Subsection no.1.1  }
\begin{frame}
Without title somethink is missing.
\end{frame}


\section{Section no. 2}
\subsection{Lists I}
\begin{frame}\frametitle{unnumbered lists}
\begin{itemize}
\item Introduction to  \LaTeX
\item Course 2
\item Termpapers and presentations with \LaTeX
\item Beamer class
\end{itemize}
\end{frame}

\begin{frame}\frametitle{lists with pause}
\begin{itemize}
\item Introduction to  \LaTeX \pause
\item Course 2 \pause
\item Termpapers and presentations with \LaTeX \pause
\item Beamer class
\end{itemize}
\end{frame}

\subsection{Lists II}
\begin{frame}\frametitle{numbered lists}
\begin{enumerate}
\item Introduction to  \LaTeX
\item Course 2
\item Termpapers and presentations with \LaTeX
\item Beamer class
\end{enumerate}
\end{frame}

\begin{frame}\frametitle{numbered lists with pause}
\begin{enumerate}
\item Introduction to  \LaTeX \pause
\item Course 2 \pause
\item Termpapers and presentations with \LaTeX \pause
\item Beamer class
\end{enumerate}
\end{frame}

\section{Section no.3}
\subsection{Tables}
\begin{frame}\frametitle{Tables}
\begin{tabular}{|c|c|c|}
\hline
\textbf{Date} & \textbf{Instructor} & \textbf{Title} \\
\hline
WS 04/05 & Sascha Frank & First steps with  \LaTeX  \\
\hline
SS 05 & Sascha Frank & \LaTeX \ Course serial \\
\hline
\end{tabular}
\end{frame}


\begin{frame}\frametitle{Tables with pause}
\begin{tabular}{c c c}
A & B & C \\
\pause
1 & 2 & 3 \\
\pause
A & B & C \\
\end{tabular}
\end{frame}


\section{Section no. 4}
\subsection{blocs}
\begin{frame}\frametitle{blocs}

\begin{block}{title of the bloc}
bloc text
\end{block}

\begin{exampleblock}{title of the bloc}
bloc text
\end{exampleblock}


\begin{alertblock}{title of the bloc}
bloc text
\end{alertblock}
\end{frame}

\section{Section no. 5}
\subsection{split screen}

\begin{frame}\frametitle{splitting screen}
\begin{columns}
\begin{column}{5cm}
\begin{itemize}
\item Beamer
\item Beamer Class
\item Beamer Class Latex
\end{itemize}
\end{column}
\begin{column}{5cm}
\begin{tabular}{|c|c|}
\hline
\textbf{Instructor} & \textbf{Title} \\
\hline
Sascha Frank &  \LaTeX \ Course 1 \\
\hline
Sascha Frank &  Course serial  \\
\hline
\end{tabular}
\end{column}
\end{columns}
\end{frame}

\subsection{Pictures}
\begin{frame}\frametitle{pictures in latex beamer class}
\begin{figure}
\includegraphics[scale=0.5]{PIC1.png}
\caption{show an example picture}
\end{figure}
\end{frame}

\subsection{joining picture and lists}

\begin{frame}
\frametitle{pictures and lists in beamer class}
\begin{columns}
\begin{column}{5cm}
\begin{itemize}
\item<1 -="-"> subject 1
\item<3 -="-"> subject 2
\item<5 -="-"> subject 3
\end{itemize}
\vspace{3cm}
\end{column}
\begin{column}{5cm}
\begin{overprint}
\includegraphics<2>{PIC1}
\includegraphics<4>{PIC2}
\includegraphics<6>{PIC3}
\end{overprint}
\end{column}
\end{columns}
\end{frame}


\subsection{pictures which need more space}
\begin{frame}[plain]
\frametitle{plain, or a way to get more space}
\begin{figure}
\includegraphics[scale=0.5]{PIC1}
\caption{show an example picture}
\end{figure}
\end{frame}



\end{document}

Wednesday, August 29, 2012

C/C++ header file inlcude "" OR<> how to use std ?



Suppose your project current directory is

0.   /home/myproj/

Let assume include foder are


1. /home/include1/
2. /home/include2/

==>  # include "myhead.h"
   search for directory in order 0 , then 1 , then 2

AS IT SEARCH CURRENT DIR FIRST, USUALLY user defined headers used in this way.

==>  # include
   search for directory in order 1 , then 2 , then 0

AS IT SEARCH INCLUDE DIR FIRST,  standard headers used in this way.

Header file extension:
==================
In C : In early days it was .hxx, .hpp . Then it was .h

    #include myheader.hxx
    #include myheader.hp
    #include myheader.hpp

In C++ : No extension. They are included in namespaces.

   #include
   using namespace std; // use original namespace

you can also use it in code as std::stringfunction

Tuesday, August 28, 2012

linux configuration file : bash_profile / bash_logout / bashrc


see also : http://www.troubleshooters.com/linux/prepostpath.htm

source : http://www.hypexr.org/bash_tutorial.php#whatis

In your home directory, 3 files have a special meaning to Bash, allowing you to set up your environment automatically when you log in and when you invoke another Bash shell, and allow you to execute commands when you log out.
These files may exist in your home directory, but that depends largely on the Linux distro you're using and how your sysadmin (if not you) has set up your account. If they're missing, Bash defaults to /etc/profile.
You can easily create these files yourself using your favorite texteditor. They are:
  • .bash_profile : read and the commands in it executed by Bash every time you log in to the system
  • .bashrc : read and executed by Bash every time you start a subshell
  • .bash_logout : read and executed by Bash every time a login shell exits
Bash allows 2 synonyms for .bash_profile : .bash_login and .profile. These are derived from the C shell's file named .login and from the Bourne shell and Korn shell files named .profile. Only one of these files is read when you log in. If .bash_profile isn't there, Bash will look for .bash_login. If that is missing too, it will look for .profile.
.bash_profile is read and executed only when you start a login shell (that is, when you log in to the system). If you start a subshell (a new shell) by typing bash at the command prompt, it will read commands from .bashrc. This allows you to separate commands needed at login from those needed when invoking a subshell.
However, most people want to have the same commands run regardless of whether it is a login shell or a subshell. This can be done by using the source command from within .bash_profile to execute .bashrc. You would then simply place all the commands in .bashrc.



Example .bash_profile
==================

# .bash_profile
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi


Example .bashrc (Now add all your installation path in .bashrc)
============================================
 
# .bashrc
# _LIBRARY PATHs
PYTHONPATH=/home/tanviralam/kaustmachine/basedir/Python-2.7.6/

PATH=$PATH:$PYTHONPATH
export PATH


Sunday, August 26, 2012

Sun Grid Engine : SGE : Basic Commands


http://www.ats.ucla.edu/clusters/common/computing/batch/sge.htm

http://web.njit.edu/topics/HPC/basement/sge/SGE.html

http://star.mit.edu/cluster/docs/0.92rc2/guides/sge.html




0. First see available q : qstat -g c

1. ALWAYS USE -q queuename

2. To use any particular node use -l hostname (  -l hostname=himem03 )



qsub   -q  normal.q   -e pythonTeswt.err -o pythonTeswt.out   -cwd -S /bin/bash   -l hostname=himem03  testSingledataprep.sh


## using specific node

qsub   -q  normal.q   -e pythonTeswt.err -o pythonTeswt.out   -cwd -l hostname=himem03  -S /bin/bash   -l hostname=himem03  testSingledataprep.sh


python mafParser.py $mafRoot  $dataRoot  $fNameSpecies $fNameSubEXONLncrna  $fnmStatCount $fnmStatPrcnt $fnmStatSub $fnmSUBEXONStatDETECT  $fnapslncBed

/// check my submitted jobs

qstat
// alluser

qstat -u "*"

///// all details of user and nodes

userstat

/// details of queues

qstat -g c

// delete jobs

qdel jobID

// node informations
qhost

--------  RUN any binary installed in path -----------------------

qsub -q  normal.q -cwd -b y -o 1.out -e 1.err -q normal.q -N job1 binary
------------------  JAVA  -------------------------
#!/bin/sh
qsub -q  normal.q -e myErr.err -o myOut.out -S  /usr/bin/java Test.class

-------------------- PYTHON ----------------
#!/bin/sh
qsub -q  normal.q   -e myErr.err -o myOut.out  -s /usr/bin/python  test.py

--------------------  To run in cluster using several nodes ------------
-pe smp 32
// it will try to assign 32 nodes for this job

Saturday, August 25, 2012

code p-value calculation


  pvalue/p-value calculation

HYPERGEOMETRIC TEST


http://mengnote.blogspot.com/2012/12/calculate-correct-hypergeometric-p.html

# x, q     vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.
# m     the number of white balls in the urn.
# n     the number of black balls in the urn.
# k     the number of balls drawn from the urn.
# phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

sucessDrawn=1
sucessTotal=2

sampleSize=24
populatonSizeWithoutSample=1751-sampleSize

phyper(sucessDrawn-1, sucessTotal, populatonSizeWithoutSample, sampleSize, lower.tail = FALSE, log.p = FALSE)

FISHER EXACT TEST


Bonferroni correction: Pvalue* (#of test cases)

http://www.une.edu.au/WebStat/unit_materials/c7_anova/oneway_bonferroni_adjust.htm

http://www.langsrud.com/stat/fisher.htm

Code done by  Benoit Marchand

/*
 * pvalcalc.cpp
 *      Author: 
Benoit Marchand
 */
#include
#include


    /*
     *                 CLASSA    CLASSB
     *                        -----------------
     *  PRESENT    |      |        |    TOTALpresent= n
     *  ABSENT      |          |        |    TOTALabsent
     *                         ------------------------------------------------
     *     K= TOTALclassA|        |  N = TOTALclass A + B
     * 

/*
 *  k - #PRESENT in CLASSA
 *  n - #PRESENT in CLASSA + CLASSB
 *
 *  K - #TOTAL sample in CLASSA
 *  N - #TOTAL sample in CLASSA + CLASSB
 */
double calcOneMotifPValue(int k,int n,int K,int N)
{
        double pv,t1,x;

/*
        k      n-k         |  n
        K-k    (N-K)-(n-k) |  N-n
        -------------------------
        K      N-K         |  N


        This calculates the Right Hand Side PValue.
        (LHS would be k=0,n.)
*/
        t1=lgamma(((float) K+1))+lgamma(((float) N-K+1))-lgamma(((float) N+1))+lgamma(((float) n+1))+lgamma(((float) N-n+1));
        for (pv=-1e100;k<=n && k<=K && (n-k)<=(N-K);k++) {
                x=t1-lgamma(((float) k+1))-lgamma(((float) K-k+1))-lgamma(((float) n-k+1))-lgamma(((float) N-K-n+k+1));
                pv=(pv>=x?pv+log(1.0+exp(x-pv)):x+log(1.0+exp(pv-x)));
        }

        // Bonferroni correction (pval * nSEQF)
        //rankDB[id].pvalue=((float) rankDB[id].nSEQF)*exp(pv);
        //rankDB[id].pvalue=exp(pv);
        return(exp(pv));
}


int main()
{
    printf("%.8f", calcOneMotifPValue( 75,100, 155, 200) );
}



////////////  JAVA VERSION 
import org.apache.commons.math.special.Gamma;

/*
     *                 CLASSA    CLASSB
     *                        -----------------
     *  PRESENT    |    k    |        |    TOTALpresent=n
     *  ABSENT      |          |        |    TOTALabsent
     *                         -------------------------------
     *         TOTALclassA  |        |      TOTALclass A + B
     *                 K                                    N
     */

    /*
     *  k - #PRESENT in CLASSA
     *  n - #PRESENT in CLASSA + CLASSB
     *
     *  K - #TOTAL sample in CLASSA
     *  N - #TOTAL sample in CLASSA + CLASSB
     */
    public static double calcRightsidePValue(int hitA,int hitAB,int totA,int totAB)
    {
            double pv,t1,x;

    /*
            k      n-k         |  n
            K-k    (N-K)-(n-k) |  N-n
            -------------------------
            K      N-K         |  N


            This calculates the Right Hand Side PValue.
            (LHS would be k=0,n.)
    */
            t1= Gamma.logGamma(((float) totA+1))+Gamma.logGamma(((float) totAB-totA+1))-Gamma.logGamma(((float) totAB+1))+Gamma.logGamma(((float) hitAB+1))+Gamma.logGamma(((float) totAB-hitAB+1));
            for (pv=-1e100;hitA<=hitAB && hitA<=totA && (hitAB-hitA)<=(totAB-totA);hitA++) {
                    x=t1-Gamma.logGamma(((float) hitA+1))-Gamma.logGamma(((float) totA-hitA+1))-Gamma.logGamma(((float) hitAB-hitA+1))-Gamma.logGamma(((float) totAB-totA-hitAB+hitA+1));
                   
                    pv=( pv>=x  ? pv+  Math.log(1.0+Math.exp(x-pv))  :  x+ Math.log(1.0+ Math.exp(pv-x)));
            }

            // Bonferroni correction (pval * nSEQF)
            //rankDB[id].pvalue=((float) rankDB[id].nSEQF)*exp(pv);
            //rankDB[id].pvalue=exp(pv);
            return( Math.exp(pv));
    }


Fisher 's exact test in R package: stats
===========================
myContin = matrix(c(1,4,7,4), nrow = 2);
fisher.test(myContin ,alternative = "two.sided" ) ; // two sided

fisher.test(myContin ,alternative = "greater" ) ; // right-hand sided
fisher.test(myContin ,alternative = "less" ) ; // left-hand sided

Tuesday, August 21, 2012

Python tutorial




How to install different packages:


In MAC, use pip

pip install -U numpy scipy scikit-learn

How to see where the package is installed


In MAC: pip show numpy


Import all

============

import className
OR
from package import className


Code Structure
========================

1.  just write script one line by one

print "ab"
print "cd"

2. Use function (Here you must need a main function):

def myfunc():
    print "Inside fnc ab"

if __name__ == '__main__':
    pass
    myfunc()
    objUtil = Utilities.printString("Hello world")

3. Use like java, class + function +main


myclass.py
--------------------

class LncRNA(object): 
    def __init__(self,chr , chrStart , chrEnd):
        self.chr = chr
        self.chrStart = chrStart
        self.chrEnd = chrEnd
       
    def classfunction():
        sOut='';
        sOut+= str(self.chr) + '\t' + str(self.chrStart) + '\t' + str(self.chrEnd);
        return sOut;

caller.py
--------------
def myfunc():
    print "Inside fnc ab"

if __name__ == '__main__':
    pass
    myfunc()
    obj = LncRNA("chr1" , 120 , 230)
    obj.classfunction()

 Like java : Inner class + function +
-----------------------------------------

class myinnter:
          def __init__(self,larray):                self.array = larray
           def fncinner:
                print "hi inner"

def myfnc:
      print "inside fnc"

if __name__ == '__main__':
    myfunc()
    obj = myinner;
   obj.fncinner


Return value  No need to write return in function definition
==============================

def myret(): 
 return val;


print
============
( like java , strins can be added by +)
( like c  BUT use % instead of , "%d" %i)
============================

print "#lncrna in %s" %chrName + " is: %d" %chrmWiseLNCRNAList.__len__()

==============================================
Array

Completely different from C/Java. It is like list. But only 1 type can be saved 
===============================================

// First way to create array like list
myArr = array('L')
myArr.append(54)
myArr.append(54)
print myArr[i]

// second way to create array like list
arrSize = 5;
myArr = [ 0 for val in range(arrSize)]
print myArr[i]


def arrayProcess():
    noRow = 2
    noCol = 4;
    table= [
            [ 0 for j in range(noCol) ]
                for i in range(noRow)
            ]
    print table
    print 'DONE'
    inc = 1
    for i in range(noRow):
        for j in range(noCol):
            table[i][j]= inc
            inc +=1
           
    for d1 in range(noRow):
        for d2 in range(noCol):
            print str(table[d1][d2]) + '\t',          
        print '\n',

for loop increment or step by 2 other than 1
==============================

for i in range( start, end , inc):
    print "Processing: " + i

for loop indexing x[ i : j] ==> from i to (j-1)
output for both case : 0 1 2
===========================
for i in range(0 ,3 ,1):
    print i,

a = []
a.append(0);a.append(1);a.append(2);a.append(3);
for i in a[0:3]:
    print i,


Basic string operation
=================

def stringProcess():
    s = "Hello World"
    myfields = s.split()
    for field in myfields:
        print field
    print "Str len is:" + str( s.__len__() )

    ####  capital 1st letter ####

    print s.capitalize() 
    ##### index find ####

    print "position of ello is: " + str( s.find("ello") )

    ##### substing ####
    print "substring: " + s[ s.find("ello"): 5]

   #### compare####
    if str(chrName).lower().__eq__('chr1') :
            return 0;
     else:
            return 1;

  ######  logical negation #######

   if not str(chr).lower().__eq__(chrmName ):
            continue

    #### concat####
    s = ''
    print s + " how are you?" # CONCAT

    #### replace ####
    newString= s.replace("H", "M")
    print newString
    print s

    ##### trim / strip ####
    print s.strip()
    # check prefix & postfix
    print s.startswith("He")
    print s.endswith("world")


L1. Create class
==============


class Motif(object):
    staticVar=5
    @staticmethod
    def staticFnc():
        print 555

    def __init__(self, id, name , value):
        self.id = id
        self.name = name
        self.value = value
   
    def __str__(self, *args, **kwargs):
        sOut='';
        sOut+= str(self.id) + '\t' + self.name + '\t' + str(self.value);
        return sOut;   
   
    def update(self,inc):
        self.value = self.value + inc


**** calling *******

from com.cbrc.bean.Motif import Motif

obj = Motif(1,'tanvir',10)
obj.update(5)
Motif.staticFnc()
print Motif.staticVar

obj.update(99)
print obj.value

print obj

2. Read from file
==============
from scipy.io.fopen import fopen

def readIt( fnameIn,fnameOut):  
    fidOut= open(fnameOut , "w"); 
    fid = open(fnameIn , "r");
    for curLine in fid:
        fidOut.write(curLine);
      
    fid.close();
    fidOut.close();

3. String split with regular expression
========================

import re

myPat= re.compile("[\s]+")
myFields = myPat.split(curLine )
        for col in myFields:
            print col


4. List of tuple
===============

def tupleProcess( fnameIn,fnameOut):
    myPat= re.compile("[\s]+")
    myList = list() # TUPLE OBJECT DECLARATION
   
    fidOut= open(fnameOut , "w");  
    fid = open(fnameIn , "r");
    for curLine in fid:
        myFields = myPat.split(curLine );
        chr = myFields[0]
        chrStart = int(myFields[1])
        chrEnd = int(myFields[2])
        myBed = (chr, chrStart, chrEnd) # tuple formation
        myList.append(myBed)
       
    # write data of LIST
    print myList.__len__()
    for curBed in myList:
        fidOut.write( curBed[0] + '\t' + str(curBed[1]) )
        fidOut.write("\n") 
  fid.close();

  # write buffer data to File
def writeFileFromBuffer( fnmOut, myBuffer):

    target = open(fnmOut ,"w")
    target.write(myBuffer + '' )
    target.close()


4 Vector of Objects
=================


class MyClass(object):
    def __init__(self, number):
        self.number = number
if __name__ == "__main__":
   my_objects = []

   for i in range(5):
      my_objects.append(MyClass(i))

   for obj in my_objects:
      print obj.number
==========================
5. Hashtable / Dictionary

# For un-ordered dictionary / oedered hashmap
myHash = dict()
myHash['mykey'] = val
print myHash['mykey']

# For ordered dictionary / oedered hashmap
import collections
dictLinesNR = collections.OrderedDict();

==========================

class LncRNA(object): 
    def __init__(self,chr , chrStart , chrEnd):
        self.chr = chr
        self.chrStart = chrStart
        self.chrEnd = chrEnd
       
    def __str__(self, *args, **kwargs):
        sOut='';
        sOut+= str(self.chr) + '\t' + str(self.chrStart) + '\t' + str(self.chrEnd);
        return sOut;

   def __getAnyValue__(self, *args, **kwargs):
    return self.chrStart;


def dictionaryProcess( fnameIn,fnameOut):
    myPat= re.compile("[\s]+")
    myDict = dict() 
    fidOut= open(fnameOut , "w");  
    fid = open(fnameIn , "r");
    for curLine in fid:
        myFields = myPat.split(curLine );
        chr = myFields[0]
        chrStart = int(myFields[1])
        chrEnd = int(myFields[2])
        myRNA = LncRNA(chr, chrStart, chrEnd) # tuple formation
        print myRNA
        if myDict.__contains__(chr):
          print "Already exist: " + chr
    else:
                myDict[chr] =   myRNA      # INSERT NEW ELEM   
    # write data of DICTIONARY
    print myDict.__len__()
    for k, v in myDict.iteritems():
        fidOut.write( k + '\t+ v.__getAnyValue__()+ '\t' + v.__str__() )
     
        fidOut.write("\n")

   fid.close();

============
6. SET
mySet = set()
mySet.add(elem)
============

def setProcess( fnameIn,fnameOut):
    myPat= re.compile("[\s]+")
    mySet = set() 
    fidOut= open(fnameOut , "w");  
    fid = open(fnameIn , "r");
    for curLine in fid:
        myFields = myPat.split(curLine );
        chr = myFields[0]
        chrStart = int(myFields[1])
        chrEnd = int(myFields[2])
        mySet.add(chr)  # INSERT NEW ELEM    

   fid.close();

    mySet2 = set([ 'chr2', 'chrX']);   
    # write data of SET
    print mySet.__len__()
    for elem in mySet:
        fidOut.write( elem )
        fidOut.write("\n")
    fidOut.close();

    sortedSet = sorted ( mySet , key= int )  
    uni = mySet | mySet2
    intrsct = mySet & mySet2
    diff = mySet - mySet2



2D Array
==============

Creation and initialize with 0:

myMat = [[0 for x in range( xDimention)] for y in range(yDimension)]

Write
def writeMatrix2D(myMat,  myseperator, outFile):
    print "writing matrix of count"    xSize = len(myMat)
    ySize = len(myMat[0])

    mybuffer = ''    for d1 in range(xSize):
        for d2 in range(ySize):
            if d2== (ySize-1):
                mybuffer += str(myMat[d1][d2])
            else:
                mybuffer += str(myMat[d1][d2]) + myseperator
        mybuffer += "\n"    writeFileFromBuffer(outFile, mybuffer)


Write buffer content to file

def writeFileFromBuffer( fnmOut, myBuffer):

    target = open(fnmOut ,"w")
    target.write(myBuffer + '' )
    target.close()



NUMPY OPERATIONS



Matrix operations


Matrix Creation
matrixNP1 = numpy.matrix(  numpy.zeros(shape=(totState , totHour)  , dtype = float) , dtype=float )

#Sum of Column
sumColumn = matrixNP1.sum(axis=0)

# Sum of Row
sumRow = matrixNP1.sum(axis=1)


# Divide each column by a vector ( e.g. sum of column)
matrixNP1 = matrixNP1/ matrixNP1.sum(axis=1)


# Divide each element  by a scalar
matrixNP1 = matrixNP1/1




Exception Handling


http://stackoverflow.com/questions/4990718/python-about-catching-any-exception



import sys
try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:  # handle all exceptions
    print "Unexpected error:", sys.exc_info()[0]
    raise




Thursday, August 2, 2012

linux installation common command


1. ./configure

2. make

3. make check

4. make install

Wednesday, July 25, 2012

Bioinformatics tool


0. BIOPYTHON mafIO
==================

    def mafParsing(self,chrName ):
        idx = MafIndex(chrName + ".mafindex", chrName + ".maf", "hg19." + chrName )
        multiple_alignment = idx.get_spliced([60000] ,  [60020] , strand = "-")      
        AlignIO.write(multiple_alignment, "hg19_All.fa", "fasta")
        total_bases_tupBel1  = 0;
        for seqrec in multiple_alignment:
                if seqrec.id.startswith("tupBel1"): # mm4  tupBel1  tanvir
                    # don't count gaps as bases
                    total_bases_tupBel1 += len( str(seqrec.seq).replace("-", "") )
                  
        print "total of %s base align" %total_bases_tupBel1  + " %s"   %total_bases_tupBel1

1. mafsInRegion
=====================================

desc:  Find multi alignment output in different species

usage:  ./mafsInRegion -keepInitialGaps   -outDir test.bed out_dir chr22.maf


precaution:

1. use -outDir if you want gene name specific output in seperate file (1 file per 1 sample).




2. liftOver
=====================================

desc: Lift  co-ordinate of one assembly over another assembly. Give output according to similarity match.


usage: ./liftOver old.chain new.chain mapped.out unMapped.out


precaution:

for single species use similaritythreshold 0.95
for multiple species use similarity thr 0.10
It is not optimized for multiple species


3. match@ TRANSFAC professional
=====================================

desc: Find hit location of matrix over sequence


usage: ./match  matrix.dat promoter.seq  map.out matrix.profile


precaution:

if a matrix's profile is not available in profile file, this matrix will not be mapped over sequence.

4.OrthoMCL
==================================

desc: Find orthologous gene from >=2 species

installation:

a.  download  the unsupported version. If you download full version you have to install mysql and others.

http://orthomcl.org/common/downloads/software/unsupported/v1.4/

There are 2 zip files. One for OrthoMCL and second one for MCL.

b. compile MCL: (Read the install file for help)
      unzip
      ./configure
      make
      make install
c. change the path name in OrthoMCL folder file orthomcl_module.pm

for variable $BLASTALL , $FORMATDB , $MCL, $$PATH_TO_ORTHOMCL, $ORTHOMCL_DATA_DIR

d.  Keep all the fasta file in $ORTHOMCL_DATA_DIR  folder.

usage: (See README under OrthoMCLv1.4)

perl orthomcl.pl --mode 1 --fa_files Ath.fa,Hsa.fab


precaution


5. blastp
==========================

blastp -evalue 1e-05 -query testBee.prot.fa -subject testBee.prot.fa -outfmt 6 -out myout.txt

6. apriori algorithm
===========================
// min support 90% , maximal subset , unlimited #item in a itemset.
apriori.exe -tm -s90  gene_tf.input  gene_tf_90.out