Skip to main content

Bioinformatics tool


0. BIOPYTHON mafIO
==================

    def mafParsing(self,chrName ):
        idx = MafIndex(chrName + ".mafindex", chrName + ".maf", "hg19." + chrName )
        multiple_alignment = idx.get_spliced([60000] ,  [60020] , strand = "-")      
        AlignIO.write(multiple_alignment, "hg19_All.fa", "fasta")
        total_bases_tupBel1  = 0;
        for seqrec in multiple_alignment:
                if seqrec.id.startswith("tupBel1"): # mm4  tupBel1  tanvir
                    # don't count gaps as bases
                    total_bases_tupBel1 += len( str(seqrec.seq).replace("-", "") )
                  
        print "total of %s base align" %total_bases_tupBel1  + " %s"   %total_bases_tupBel1

1. mafsInRegion
=====================================

desc:  Find multi alignment output in different species

usage:  ./mafsInRegion -keepInitialGaps   -outDir test.bed out_dir chr22.maf


precaution:

1. use -outDir if you want gene name specific output in seperate file (1 file per 1 sample).




2. liftOver
=====================================

desc: Lift  co-ordinate of one assembly over another assembly. Give output according to similarity match.


usage: ./liftOver old.chain new.chain mapped.out unMapped.out


precaution:

for single species use similaritythreshold 0.95
for multiple species use similarity thr 0.10
It is not optimized for multiple species


3. match@ TRANSFAC professional
=====================================

desc: Find hit location of matrix over sequence


usage: ./match  matrix.dat promoter.seq  map.out matrix.profile


precaution:

if a matrix's profile is not available in profile file, this matrix will not be mapped over sequence.

4.OrthoMCL
==================================

desc: Find orthologous gene from >=2 species

installation:

a.  download  the unsupported version. If you download full version you have to install mysql and others.

http://orthomcl.org/common/downloads/software/unsupported/v1.4/

There are 2 zip files. One for OrthoMCL and second one for MCL.

b. compile MCL: (Read the install file for help)
      unzip
      ./configure
      make
      make install
c. change the path name in OrthoMCL folder file orthomcl_module.pm

for variable $BLASTALL , $FORMATDB , $MCL, $$PATH_TO_ORTHOMCL, $ORTHOMCL_DATA_DIR

d.  Keep all the fasta file in $ORTHOMCL_DATA_DIR  folder.

usage: (See README under OrthoMCLv1.4)

perl orthomcl.pl --mode 1 --fa_files Ath.fa,Hsa.fab


precaution


5. blastp
==========================

blastp -evalue 1e-05 -query testBee.prot.fa -subject testBee.prot.fa -outfmt 6 -out myout.txt

6. apriori algorithm
===========================
// min support 90% , maximal subset , unlimited #item in a itemset.
apriori.exe -tm -s90  gene_tf.input  gene_tf_90.out








Comments

Popular posts from this blog

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partition     selectedKfoldSen=[];selectedKfoldSpe=[];selectedKfoldAcc=[];     indexSen=1;indexSpe=1;indexAcc=1;     if ( kfold == (P+N) )% leave one out         sensitivity = sum(cvtp) /( sum(cvtp) + sum(cvfn) )         specificity = sum(cvtn) /( sum(cvfp) + sum(cvtn) )         acc

R tutorial

Install R in linux ============ In CRAN home page, the latest version is not available. So, in fedora, Open the terminal yum list R  --> To check the latest available version of r yum install R --> install R version yum update R --> update current version to latest one 0 find help ============ ?exact topic name (  i.e.   ?mean ) 0.0 INSTALL 3rd party package  ==================== install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package BED file parsing (Always use read.delim it is the best) library(MASS) #library(ggplot2) dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/" dirData="D:/research/F5shortRNA/TestRIKEN/" setwd(dirRoot) getwd() myBed="test.bed" fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed ## Read bed use read.delim - it is the  best mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = &q