Sunday, February 9, 2014

RNA-seq expression software install and usage


Some basic terminology used in RNA-seq papers


http://thegenomefactory.blogspot.com/2013/08/paired-end-read-confusion-library.html




Basic idea to calculate RPKM (Reads Per Kilo exon per Million mappedread)



http://seqanswers.com/forums/showthread.php?t=29549

First of all, Cufflinks uses FPKM(Fragments Per Kilobase of exon per Million mapped fragments) instead of RPKM(Reads Per Kilobase of exon per Million mapped reads) to avoid confusion when dealing with paired-end data.

Secondly, Cufflinks uses corrections when calculating FPKM, so if you do a simple calculation it will not match that of Cufflink's. Anyway, the crude calculation for a gene would be (NOT the one that Cufflinks uses):

FPKM = [f / (e / 1000)] / (m / 1,000,000)

f - number of fragments mapping to gene
e - exonic length of gene
m - total number of mapped fragments

If you would like to know more about the corrections that Cufflinks applies to FPKM, see this paper:
Trapnell C, Williams BA, Pertea G, Mortazavi AM, Kwan G, van Baren MJ, Salzberg SL, Wold B, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Nature Biotechnology doi:10.1038/nbt.1621

Supplementary Text and Figures, 3. Transcript abundance estimation

Also, have a look at Cufflink's FAQ







Tools



1. Cufflink: (pre-requisite : boost, samtools, elgen )

install:

http://cufflinks.cbcb.umd.edu/tutorial.html#inst

usage:



2.Tophat (pre-requisite : boost, samtools)

install:


usage:


3. HTseq


4. fluxcapacitor


5. bedtools


Tuesday, February 4, 2014

clsuter user manual : Noor use matlab at noor : MATLAB GUI LOAD FROM CLUSTER





1. First you have to have an account at noor. If not call ithelpdesk for that.
2. Then login as
3. Load matlab under your account

load matlab/R2012b
4.  Run matlab by following command

/opt/share/MATLAB/matlab.R2012b


Load matlab GUI
===========
module load matlab
bsub -XF -I -q rh6_interactive matlab -desktop




Following is the basic for using cluster noor:
Following is the basic for using any tool in noor cluster:

http://rcweb.kaust.edu.sa/KAUST/ResearchComputing/wiki/up2speed

Linux command to check number of processor core , memory ram size, disk usage disk size



To get the number of processor
====================

less /proc/cpuinfo | grep processor

To get the details of each processor and number of core
=======================
# very detailed information
less /proc/cpuinfo | less

# summary information how many core/processor etc

lscpu

To get the RAM size
=============

1. In human readable format

free -g
free -m

2. In more details

less /proc/meminfo 


To get the disk size
=============

1. In human readable format

df -h

Total size of a folder
===============
Enter into folder and then
du -ch | grep total




Monday, February 3, 2014

Converter Bed to GENCODE v3 GTF



Intro

Gencode V3 GTF format is different from UCSC GFF/GTF. Here is the description of format. Gencode V3 GTF has 9 mandatory key=value pairs, which are not mandatory for UCSC GFF/GTF. Here is the description

http://www.sanger.ac.uk/resources/databases/encode/gencodeformat.html

Here is a code to convert bed file to Gencode v3 GTF format:

import java.util.Vector;
import java.util.regex.Pattern;

import com.cbrc.bean.TrxExonInfo;
import com.cbrc.common.CommonFunction;

public class BedTools_BedToGencodeGTFv3 {


        void convert_Bed_GencodeGtfv3(String fnmBed, String fnmGtf)
        {

                String tmp[];
                Pattern p = Pattern.compile("[\\t]+");
                Vector vectBedStr = CommonFunction.readlinesOfAfile(fnmBed);
                StringBuffer buf = new StringBuffer();
                for(int i=0;i                 {
                        tmp = p.split(vectBedStr.get(i), 12);
                        TrxExonInfo trx = new TrxExonInfo(tmp[0], tmp[1], tmp[2], tmp[3],
tmp[4], tmp[5], tmp[6], tmp[7], tmp[8], tmp[9], tmp[10], tmp[11]) ;


                        // create 1 entry for transcript

                        buf.append(trx.getChrom()+"\t"+"LLlab"+"\t"+"transcript"+ "\t"+
                                        (trx.getStart()+1)+"\t"
+trx.getEnd()+"\t"+trx.getScore()+"\t"+trx.getStrand()+"\t" +
                                        "."+"\t"+ getAdditionalInfoMandatory(trx.getName() ) +"\n");

                        // create multiple entry for exons

                        for(int  e=0; e                         {
                                buf.append(trx.getChrom()+"\t"+"LLlab"+"\t"+"exon"+ "\t"+
                                                (trx.getExonStarts().get(e)+1)+"\t"
+trx.getExonEnds().get(e)+"\t"+trx.getScore()+"\t"+trx.getStrand()+"\t"
+
                                                "."+"\t"+ getAdditionalInfoMandatory(trx.getName() ) +"\n");

                        }
                }



                CommonFunction.writeContentToFile(fnmGtf, buf+"");

        }

        String getAdditionalInfoMandatory(String trxID)
        {
                return
                        " gene_id "    + trxID + ";" +
                        " transcript_id " + trxID + ";" +
                        " gene_type "    + "RNA" + ";" +
                        " gene_status "    + "KNOWN" + ";" +
                        " gene_name "    + trxID + ";" +
                        " transcript_type "    + "RNA" + ";" +
                        " transcript_status "    + "KNOWN" + ";" +
                        " transcript_name "    + trxID + ";" +
                        " level "    + "1" + ";" ;

        }


        public static void main(String[] args) {
                BedTools_BedToGencodeGTFv3 obj = new BedTools_BedToGencodeGTFv3();
                obj.convert_Bed_GencodeGtfv3(args[0],   args[1] );
                // example
//              obj.convert_Bed_GencodeGtfv3("./test.bed",
"./coding.withrpt.bed.wholebody.bed.gtf3" ); //
"./coding.withrpt.bed.wholebody.bed"
//              obj.convert_Bed_GencodeGtfv3("./noncoding.withrpt.bed.wholebody.bed.bed",
"./noncoding.withrpt.bed.wholebody.bed.gtf3" );

        }

}

Sunday, February 2, 2014

Matlab string array using cell cell array


y=[1:10];
mycell = cell(10,1);
for i=1:5 
   mycell(i)=cellstr('mRNA prom');
end
for i=6:10 
   mycell(i)=cellstr('lncRNA prom');
end   
boxplot(y , mycell ,'notch','on','whisker',1);