Tuesday, September 16, 2014

JAVA pattern match pattern search split example

import java.util.regex.Matcher;
import java.util.regex.Pattern;

// Check if input match a pattern

Pattern curPat = Pattern.compile( "tanvir"+ "(.)+" + "tanvir");
Matcher mymatcher;

String input="tanvirAAAAAAAtanvir";
mymatcher = curPat.matcher( input );

        System.out.println("Matched pattern: "+ input );
// List all hit with position in input string
m=  ConstantValue.patReplicaFantom.matcher(colNameUnReadable) ;
 while (m.find()) {
                        System.out.print("Start index: " + m.start());
                        System.out.print(" End index: " + m.end());
                        System.out.println(" Found: " + m.group());

// Split a input according to pattern

Pattern patFastaHeade = Pattern.compile("[>_]+"); 
String text=">aaaa";
String tmp[];
tmp = patFastaHeade.split(text);
for( int i=0; i < tmp.length ; i++) }
              System.out.println( tmp[i] );

Monday, September 8, 2014

CAGE / RNA-seq normalizaton

Various normalization techniques in different tools

1. DESeq

Input a N by C matrix:
 N row ;  N genes
C column; Each column represents replica and different condition
Each cell : Represents the integer tag count value OR whatever value you want.

1. From count matrix, for each row, calculate the geometric mean.
2. Diving each value by GM of corresponding row which generates size factor (SF) of each column.
3. Divide each count value by SF


function doDESEQnorm

    0    0    0    0    0    0    1;
    92    161    76    70    140    88    70;
    5    1    0    0    4    0    0;
    0    2    1    2    1    0    0 ];
% This shows four gene; in tow condition untreated and treated; First four replica for "untreated" and last %three column for "treated";

sizeRow = size(mymat,1);
sizeCol = size(mymat,2);

%% estimateSizeFactors %% 1 - Find Geometric Mean (GM) of Each row
gmRow = zeros(1,sizeRow);
for i=1:sizeRow
    curRow = mymat(i,:)';
    nzId = find(curRow);
    tmpVal = curRow(nzId);
    gm = geomean(tmpVal);
    gmRow(i) = gm;
mymat3= mymat;

%% estimateSizeFactors %% 2 - Divide each colum by corresponding row GM
afterDiv = mymat3;
for i=1:sizeRow
    afterDiv(i,:) = mymat3(i,:) / gmRow(i);

%% estimateSizeFactors %% 3 - SizeFactor: Take the median of Non-Zero Mormalized values
sizeFactor = zeros(1,sizeCol);
for col=1:sizeCol
   nzRow = find( afterDiv(:,col) );
   nzVal = afterDiv(nzRow,col);
   nzVal = sort(nzVal);
%    numberNZ = size(nzVal,1);
%    if rem(numberNZ, 2) ==0
%        idx = numberNZ/2 ;
%    else
%        idx = fix(numberNZ/ 2) +1;
%    end
%    sizeFactor(col)= nzVal(idx,1);

   sizeFactor(col) = median(nzVal);

%% counts(sf, normalize=T) ; %% Do Normalzation
normValue = mymat;
for col=1:sizeCol
    normValue(:,col) = normValue(:,col)./sizeFactor(col);   


Friday, July 18, 2014

Expression analysis with R: package edgeR

Sample code for edgeR and DESeq for differential expressed gene

Best step-by-step code


Comparision of edgeR and DESeq


# edgeR 
## Make design matrix
condition < relevel(factor(meta$condition), ref="untreated")
libType < factor(meta$libType)
edesign < model.matrix(~libType+condition)
## Make new DGEList, normalize by library size, and estimate dispersion allowing
 possible trend with average count size
e < DGEList(counts=counttable)
e < calcNormFactors(e)
e < estimateGLMCommonDisp(e, edesign)
e < estimateGLMTrendedDisp(e, edesign)
e < estimateGLMTagwiseDisp(e, edesign)
## MDS Plot
plotMDS(e, main="edgeR MDS Plot")
## Biological coefficient of variation plot
plotBCV(e, cex=0.4, main="edgeR: Biological coefficient of variation (BCV) vs abundance")
## Fit the model, testing the coefficient for the treated vs untreated comparison
efit < glmFit(e, edesign)
efit < glmLRT(efit, coef="conditiontreated")
## Make a table of results
etable < topTags(efit, n=nrow(e))$table
etable < etable[order(etable$FDR), ]
## ~MA Plot
with(etable, plot(logCPM, logFC, pch=20, main="edgeR: Fold change vs abundance"))
with(subset(etable, FDR<0.05), points(logCPM, logFC, pch=20, col="red"))
abline(h=c(-1,1), col="blue")

Tuesday, May 27, 2014


Formatting Database for Blast+


To use any sequence by Blast/Blast+ , we need to format the fasta . Let's do it  for uniprot:

1. download uniprot from the link provided

2. issue the following command for blast+:
/home/apps/ncbi/blast+/2.2.29/bin/makeblastdb -dbtype prot -in /path/to/uniprot.fasta

To get blast+ commands, if you are familiar with legacy blast, use legacy_blast.pl in the following way:

$ /home/apps/ncbi/blast+/2.2.29/bin/legacy_blast.pl formatdb -i test.fasta -p T --path /home/apps/ncbi/blast+/2.2.29/bin --print_only

/home/apps/ncbi/blast+/2.2.29/bin/makeblastdb -dbtype prot -in test.fasta

Saturday, May 24, 2014

Generate ssh key using ssh-keygen : gnerate both public and private key


Setting up SSH public/private keys

SSH (Secure Shell) can be set up with public/private key pairs so that you don't have to type the password each time. Because SSH is the transport for other services such as SCP (secure copy), SFTP (secure file transfer), and other services (CVS, etc), this can be very convenient and save you a lot of typing.

SSH Version 2

On the local machine, type the BOLD part. The non-bold part is what you might see as output or prompt.
  • Step 1:
    % ssh-keygen -t dsa
    Generating public/private dsa key pair.
    Enter file in which to save the key (~/.ssh/id_dsa):
    (just type return)
    Enter passphrase (empty for no passphrase):
    (just type return)
    Enter same passphrase again:
    (just type return)
    Your identification has been saved in ~/.ssh/id_dsa
    Your public key has been saved in ~/.ssh/id_dsa.pub
    The key fingerprint is:
    Some really long string
  • Step 2:
    Then, paste the content of the local ~/.ssh/id_dsa.pub file into the file ~/.ssh/authorized_keys on the remote host.
  • RSA instead of DSA
    • If you want something strong, you could try
      % ssh-keygen -t rsa -b 4096
    • Instead of the names id_dsa and id_dsa.pub, it will be id_rsa and id_rsa.pub , etc.
    • The rest of the steps are identical.
That's it!
  • Q: I follow the exact steps, but ssh still ask me for my password!
  • A: Check your remote .ssh directory. It should have only your own read/write/access permission (octal 700)
    % chmod 700 ~/.ssh

Monday, May 12, 2014

Java Plots / graphs using jfreechart jfree chart


There's even some examples right in the code, such as the one described here:

        final XYDataset dataset = createDataset();
        final JFreeChart chart = createChart(dataset);
        try {
            File f1 = new File("myline.png");
            ChartUtilities.saveChartAsPNG(f1, chart, 500, 400);
        } catch (Exception e) {

Sunday, May 11, 2014


Open Control panel

sudo  /opt/lampp/share/xampp-control-panel/xampp-control-panel
How to start / stop apache server in xampp

sudo /opt/lampp/lampp start
sudo /opt/lampp/lampp stop

How to change default apache port 80 to another port:

Find the line "Listen 80" in any of these files and change to new port number.


How to install linux iso image

1. Download you iso file.

say myfile.iso

2. Make a directory where it will be mounted

sudo mkdir /mnt/iso

3. Mount the iso file there

sudo mount -t iso9660 -o loop /home/Download/myfile.iso /mnt/iso/
4. Install it

Thursday, April 24, 2014

Tutorial Bedtools Bedtool


##### PARAMTER for INTERSECT######:
## mention strandness
# -s
# print A , with number of Hit
#  -c
# print A , with number of bp overlap
# -wo

##### EXAMPLE ####


# bedtools intersect -s -c  -a $f1 -b $fCDS  > $f1Out
# bedtools intersect -s -wo  -a $f1 -b $fCDS  > $f1Out.bp

# bedtools intersect -s -c  -a $f2 -b $fCDS  > $f2Out
# bedtools intersect -s -wo  -a $f2 -b $fCDS  > $f2Out.bp

# bedtools intersect -s -wo -a test1.bed -b test2.bed

##### PARAMTER for GETFASTA ######:
## mention strandness
# -s
## input and output file
# -fi  -fo

##### EXAMPLE ####
# bedtools getfasta  -s  -fi /home/data/genomes/hg19/hg19.fa -bed $fAllgeneBed -fo $fAllgeneFasta

Sunday, February 9, 2014

RNA-seq expression software install and usage

Some basic terminology used in RNA-seq papers


Basic idea to calculate RPKM (Reads Per Kilo exon per Million mappedread)


First of all, Cufflinks uses FPKM(Fragments Per Kilobase of exon per Million mapped fragments) instead of RPKM(Reads Per Kilobase of exon per Million mapped reads) to avoid confusion when dealing with paired-end data.

Secondly, Cufflinks uses corrections when calculating FPKM, so if you do a simple calculation it will not match that of Cufflink's. Anyway, the crude calculation for a gene would be (NOT the one that Cufflinks uses):

FPKM = [f / (e / 1000)] / (m / 1,000,000)

f - number of fragments mapping to gene
e - exonic length of gene
m - total number of mapped fragments

If you would like to know more about the corrections that Cufflinks applies to FPKM, see this paper:
Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nature Biotechnology doi:10.1038/nbt.1621

Supplementary Text and Figures, 3. Transcript abundance estimation

Also, have a look at Cufflink's FAQ


1. Cufflink: (pre-requisite : boost, samtools, elgen )




2.Tophat (pre-requisite : boost, samtools)



3. HTseq

4. fluxcapacitor

5. bedtools

Tuesday, February 4, 2014

clsuter user manual : Noor use matlab at noor : MATLAB GUI LOAD FROM CLUSTER

1. First you have to have an account at noor. If not call ithelpdesk for that.
2. Then login as
3. Load matlab under your account

load matlab/R2012b
4.  Run matlab by following command


Load matlab GUI
module load matlab
bsub -XF -I -q rh6_interactive matlab -desktop

Following is the basic for using cluster noor:
Following is the basic for using any tool in noor cluster:


Linux command to check number of processor core , memory ram size, disk usage disk size

To get the number of processor

less /proc/cpuinfo | grep processor

To get the details of each processor and number of core
# very detailed information
less /proc/cpuinfo | less

# summary information how many core/processor etc


To get the RAM size

1. In human readable format

free -g
free -m

2. In more details

less /proc/meminfo 

To get the disk size

1. In human readable format

df -h

Total size of a folder
Enter into folder and then
du -ch | grep total

Monday, February 3, 2014

Converter Bed to GENCODE v3 GTF


Gencode V3 GTF format is different from UCSC GFF/GTF. Here is the description of format. Gencode V3 GTF has 9 mandatory key=value pairs, which are not mandatory for UCSC GFF/GTF. Here is the description


Here is a code to convert bed file to Gencode v3 GTF format:

import java.util.Vector;
import java.util.regex.Pattern;

import com.cbrc.bean.TrxExonInfo;
import com.cbrc.common.CommonFunction;

public class BedTools_BedToGencodeGTFv3 {

        void convert_Bed_GencodeGtfv3(String fnmBed, String fnmGtf)

                String tmp[];
                Pattern p = Pattern.compile("[\\t]+");
                Vector vectBedStr = CommonFunction.readlinesOfAfile(fnmBed);
                StringBuffer buf = new StringBuffer();
                for(int i=0;i                 {
                        tmp = p.split(vectBedStr.get(i), 12);
                        TrxExonInfo trx = new TrxExonInfo(tmp[0], tmp[1], tmp[2], tmp[3],
tmp[4], tmp[5], tmp[6], tmp[7], tmp[8], tmp[9], tmp[10], tmp[11]) ;

                        // create 1 entry for transcript

                        buf.append(trx.getChrom()+"\t"+"LLlab"+"\t"+"transcript"+ "\t"+
+trx.getEnd()+"\t"+trx.getScore()+"\t"+trx.getStrand()+"\t" +
                                        "."+"\t"+ getAdditionalInfoMandatory(trx.getName() ) +"\n");

                        // create multiple entry for exons

                        for(int  e=0; e                         {
                                buf.append(trx.getChrom()+"\t"+"LLlab"+"\t"+"exon"+ "\t"+
                                                "."+"\t"+ getAdditionalInfoMandatory(trx.getName() ) +"\n");


                CommonFunction.writeContentToFile(fnmGtf, buf+"");


        String getAdditionalInfoMandatory(String trxID)
                        " gene_id "    + trxID + ";" +
                        " transcript_id " + trxID + ";" +
                        " gene_type "    + "RNA" + ";" +
                        " gene_status "    + "KNOWN" + ";" +
                        " gene_name "    + trxID + ";" +
                        " transcript_type "    + "RNA" + ";" +
                        " transcript_status "    + "KNOWN" + ";" +
                        " transcript_name "    + trxID + ";" +
                        " level "    + "1" + ";" ;


        public static void main(String[] args) {
                BedTools_BedToGencodeGTFv3 obj = new BedTools_BedToGencodeGTFv3();
                obj.convert_Bed_GencodeGtfv3(args[0],   args[1] );
                // example
//              obj.convert_Bed_GencodeGtfv3("./test.bed",
"./coding.withrpt.bed.wholebody.bed.gtf3" ); //
//              obj.convert_Bed_GencodeGtfv3("./noncoding.withrpt.bed.wholebody.bed.bed",
"./noncoding.withrpt.bed.wholebody.bed.gtf3" );



Sunday, February 2, 2014

Matlab string array using cell cell array

mycell = cell(10,1);
for i=1:5 
   mycell(i)=cellstr('mRNA prom');
for i=6:10 
   mycell(i)=cellstr('lncRNA prom');
boxplot(y , mycell ,'notch','on','whisker',1);

Sunday, January 26, 2014

Linux process management in foreground background




How to run a job in background

nohup /path/to/your/script.sh > /dev/null 2>&1 &

How to view background process

How to move job from foreground to background and vice versa

fg %jobid

bg %jobid

How to kill background jobs

kill -19 %job_id

Saturday, January 18, 2014

Linux GUI based browsing using vncserver and vncviewer

Main Source


user: alamt

Step1: Login to server
ssh alamt@

Step2: Start vncServer in login server
vncserver –geometry 1600x1024

Step3: Check the port no for vncServer in login server

 ps -ef | grep vnc | grep alamt
// It will show a big list with following port info by rfbport

 -rfbport 5905 ( so in this example portno for vncserver is 5905)
 -Xvnc :5 (so in this example screen ID is 5)

Step4: From client , run vncViewer


Step5: Kill/Stop vncserver from login server
 vncserver -kill:5