Skip to main content

Posts

Showing posts from 2014

JAVA pattern match pattern search split example

import java.util.regex.Matcher; import java.util.regex.Pattern; // Check if input match a pattern Pattern curPat = Pattern.compile( "tanvir"+ "(.)+" + "tanvir"); Matcher mymatcher; String input="tanvirAAAAAAAtanvir"; mymatcher = curPat.matcher( input ); if(mymatcher.find()) {         System.out.println("Matched pattern: "+ input );                     } // List all hit with position in input string m=  ConstantValue.patReplicaFantom.matcher(colNameUnReadable) ;  while (m.find()) {                         System.out.print("Start index: " + m.start());                         System.out.print(" End index: " + m.end());                         System.out.println(" Found: " + m.group());  } // Split a input according to pattern Pattern patFastaHeade = Pattern.compile("[>_]+");  String text=">aaaa"; String tmp[]; tmp = patFastaHeade.split(text); for( int i=

CAGE / RNA-seq normalizaton

Various normalization techniques in different tools 1. DESeq Input a N by C matrix:  N row ;  N genes C column; Each column represents replica and different condition Each cell : Represents the integer tag count value OR whatever value you want. Algo: 1. From count matrix, for each row, calculate the geometric mean. 2. Diving each value by GM of corresponding row which generates size factor (SF) of each column. 3. Divide each count value by SF MATLAB code function doDESEQnorm mymat=[        0    0    0    0    0    0    1;     92    161    76    70    140    88    70;     5    1    0    0    4    0    0;     0    2    1    2    1    0    0 ];     % This shows four gene; in tow condition untreated and treated; First four replica for "untreated" and last %three column for "treated"; sizeRow = size(mymat,1); sizeCol = size(mymat,2); %% estimateSizeFactors %% 1 - Find Geometric Mean (GM) of Each row gmRow = zeros(1,sizeRow); for i=1:si

Expression analysis with R: package edgeR

Sample code for edgeR and DESeq for differential expressed gene Best step-by-step code http://www.nbic.nl/uploads/media/Practical_DiffExpr.pdf Comparision of edgeR and DESeq http://gettinggeneticsdone.blogspot.com/2012/09/deseq-vs-edger-comparison.html # edgeR  ======== ## Make design matrix condition < relevel ( factor ( meta $ condition ), ref = "untreated" ) libType < factor ( meta $ libType ) edesign < model.matrix ( ~ libType + condition ) ## Make new DGEList, normalize by library size, and estimate dispersion allowing  possible trend with average count size e < DGEList ( counts = counttable ) e < calcNormFactors ( e ) e < estimateGLMCommonDisp ( e , edesign ) e < estimateGLMTrendedDisp ( e , edesign ) e < estimateGLMTagwiseDisp ( e , edesign ) ## MDS Plot plotMDS ( e , main = "edgeR MDS Plot" ) ## Biological coefficient of variation plot plotBCV ( e , cex = 0.4 , main = &

BLAST INSTALLATION TUTORIAL

Formatting Database for Blast+ =========================== To use any sequence by Blast/Blast+ , we need to format the fasta . Let's do it  for uniprot: 1. download uniprot from the link provided ============================= ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz 2. issue the following command for blast+: ============================= /home/apps/ncbi/blast+/2.2.29/bin/makeblastdb -dbtype prot -in /path/to/uniprot.fasta To get blast+ commands, if you are familiar with legacy blast, use legacy_blast.pl in the following way: $ /home/apps/ncbi/blast+/2.2.29/bin/legacy_blast.pl formatdb -i test.fasta -p T --path /home/apps/ncbi/blast+/2.2.29/bin --print_only /home/apps/ncbi/blast+/2.2.29/bin/makeblastdb -dbtype prot -in test.fasta

Generate ssh key using ssh-keygen : gnerate both public and private key

http://www.ece.uci.edu/~chou/ssh-key.html Setting up SSH public/private keys SSH (Secure Shell) can be set up with public/private key pairs so that you don't have to type the password each time. Because SSH is the transport for other services such as SCP (secure copy), SFTP (secure file transfer), and other services (CVS, etc), this can be very convenient and save you a lot of typing. SSH Version 2 On the local machine, type the BOLD part. The non-bold part is what you might see as output or prompt. Step 1: % ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (~/.ssh/id_dsa): (just type return) Enter passphrase (empty for no passphrase):

Java Plots / graphs using jfreechart jfree chart

http://stackoverflow.com/questions/3587025/jfree-charts-sample-tutorial http://www.screaming-penguin.com/node/4005 http://www.vogella.de/articles/JFreeChart/article.html http://adityanivas3.blogspot.com/2008/09/jfree-chart-tutorial.html http://www.javaworld.com/javaworld/jw-12-2002/jw-1227-opensourceprofile.html There's even some examples right in the code, such as the one described here: http://www.jfree.org/jfreechart/api/javadoc/org/jfree/chart/demo/TimeSeriesChartDemo1.html  // HOW TO CREATE CHART AND SAVE AS PNG         final XYDataset dataset = createDataset();         final JFreeChart chart = createChart(dataset);                 try {             File f1 = new File("myline.png");             ChartUtilities.saveChartAsPNG(f1, chart, 500, 400);         } catch (Exception e) {             e.printStackTrace();         }

PHP XAMPP APACHE CONTROL START STOP

Open Control panel sudo  /opt/lampp/share/xampp- control-panel/xampp-control- panel How to start / stop apache server in xampp sudo /opt/lampp/lampp start sudo /opt/lampp/lampp stop How to change default apache port 80 to another port: Find the line "Listen 80" in any of these files and change to new port number. /opt/lampp/etc/httpd.conf /opt/lampp/etc/original/httpd. conf

How to install linux iso image

1. Download you iso file. say myfile.iso 2. Make a directory where it will be mounted sudo mkdir /mnt/iso 3. Mount the iso file there sudo mount -t iso9660 -o loop /home/Download/myfile.iso /mnt/iso/     4. Install it   ./myfile.iso

Tutorial Bedtools Bedtool

#!/bin/sh ##### PARAMTER for INTERSECT######: ## mention strandness # -s # print A , with number of Hit #  -c # print A , with number of bp overlap # -wo ##### EXAMPLE #### f1="coding.withrpt.bed.tmp" f2="noncoding.withrpt.bed.tmp" f1Out="coding.withrpt.bed" f2Out="noncoding.withrpt.bed" fCDS="refseqcoding.CDS.bed" # bedtools intersect -s -c  -a $f1 -b $fCDS  > $f1Out # bedtools intersect -s -wo  -a $f1 -b $fCDS  > $f1Out.bp # bedtools intersect -s -c  -a $f2 -b $fCDS  > $f2Out # bedtools intersect -s -wo  -a $f2 -b $fCDS  > $f2Out.bp # bedtools intersect -s -wo -a test1.bed -b test2.bed ##### PARAMTER for GETFASTA ######: ## mention strandness # -s ## input and output file # -fi  -fo ##### EXAMPLE #### fAllgeneBed="allGene.bed" fAllgeneFasta="allGene.fasta" # bedtools getfasta  -s  -fi /home/data/genomes/hg19/hg19.fa -bed $fAllgeneBed -fo $fAllgeneFasta

RNA-seq expression software install and usage

Some basic terminology used in RNA-seq papers http://thegenomefactory.blogspot.com/2013/08/paired-end-read-confusion-library.html Basic idea to calculate RPKM ( R eads P er K ilo exon per M illion mappedread) http://seqanswers.com/forums/showthread.php?t=2954 9 First of all, Cufflinks uses FPKM (Fragments Per Kilobase of exon per Million mapped fragments) instead of RPKM (Reads Per Kilobase of exon per Million mapped reads) to avoid confusion when dealing with paired-end data. Secondly, Cufflinks uses corrections when calculating FPKM, so if you do a simple calculation it will not match that of Cufflink's. Anyway, the crude calculation for a gene would be (NOT the one that Cufflinks uses): FPKM = [f / (e / 1000)] / (m / 1,000,000) f - number of fragments mapping to gene e - exonic length of gene m - total number of mapped fragments If you would like to know more about the corrections that Cufflinks applies to FPKM, see this paper: Trapnell C, Willi

clsuter user manual : Noor use matlab at noor : MATLAB GUI LOAD FROM CLUSTER

1. First you have to have an account at noor. If not call ithelpdesk for that. 2. Then login as ssh -X  yourid@noor-login3.kaust.edu. sa 3. Load matlab under your account load matlab/R2012b 4.  Run matlab by following command /opt/share/MATLAB/matlab. R2012b Load matlab GUI =========== ssh -X  yourid@noor-login3.kaust.edu. sa module load matlab bsub -XF -I -q rh6_interactive matlab -desktop Following is the basic for using cluster noor: http://rcweb.kaust.edu.sa/ KAUST/ResearchComputing/wiki/ NoorGuid e Following is the basic for using any tool in noor cluster: http://rcweb.kaust.edu.sa/ KAUST/ResearchComputing/wiki/ up2speed

Linux command to check number of processor core , memory ram size, disk usage disk size

To get the number of processor ==================== less /proc/cpuinfo | grep processor To get the details of each processor and number of core ======================= # very detailed information less /proc/cpuinfo | less # summary information how many core/processor etc lscpu To get the RAM size ============= 1. In human readable format free -g free -m 2. In more details less /proc/meminfo  To get the disk size ============= 1. In human readable format df -h Total size of a folder =============== Enter into folder and then du -ch | grep total

Converter Bed to GENCODE v3 GTF

Intro Gencode V3 GTF format is different from UCSC GFF/GTF. Here is the description of format. Gencode V3 GTF has 9 mandatory key=value pairs, which are not mandatory for UCSC GFF/GTF. Here is the description http://www.sanger.ac.uk/resources/databases/encode/gencodeformat.html Here is a code to convert bed file to Gencode v3 GTF format: import java.util.Vector; import java.util.regex.Pattern; import com.cbrc.bean.TrxExonInfo; import com.cbrc.common.CommonFunction; public class BedTools_BedToGencodeGTFv3 {         void convert_Bed_GencodeGtfv3( String fnmBed, String fnmGtf)         {                 String tmp[];                 Pattern p = Pattern.compile("[\\t]+");                 Vector vectBedStr = CommonFunction. readlinesOfAfile(fnmBed);                 StringBuffer buf = new StringBuffer();                 for(int i=0;i                 {                         tmp = p.split(vectBedStr.get(i), 12);                         TrxExonInfo trx

Linux process management in foreground background

http://stackoverflow.com/questions/9190151/how-to-run-a-shell-script-in-the-backgroung-and-get-no-output http://linuxg.net/how-to-manage-background-and-foreground-processes/ http://unix.stackexchange.com/questions/45025/how-to-suspend-and-bring-a-background-process-to-foreground How to run a job in background ====================== nohup /path/to/your/script.sh > /dev/null 2>&1 & How to view background process ======================   jobs How to move job from foreground to background and vice versa ======================================= fg %jobid bg %jobid How to kill background jobs ======================   kill - 19 % job_id

Linux GUI based browsing using vncserver and vncviewer

Main Source ========== http://rcweb.kaust.edu.sa/KAUST/ResearchComputing/wiki/Workstation#HowtoruntheVNCServerLinuxWorkstation Example: ======== user: alamt serverIP: 10.68.170.137 Step1: Login to server ============== ssh alamt@10.68.170.137 Step2: Start vncServer in login server ======================== vncserver –geometry 1600x1024 Step3: Check the port no for vncServer in login server =================================  ps -ef | grep vnc | grep alamt // It will show a big list with following port info by rfbport  -rfbport 5905 ( so in this example portno for vncserver is 5905)  -Xvnc :5 (so in this example screen ID is 5) Step4: From client , run vncViewer ===================== vncviewer 10.68.170.137:5905 Step5: Kill/Stop vncserver from login server ===========================  vncserver -kill:5