Bioinformatics tool

0. BIOPYTHON mafIO
==================

    def mafParsing(self,chrName ):
        idx = MafIndex(chrName + ".mafindex", chrName + ".maf", "hg19." + chrName )
        multiple_alignment = idx.get_spliced([60000] , [60020] , strand = "-")
        AlignIO.write(multiple_alignment, "hg19_All.fa", "fasta")
        total_bases_tupBel1 = 0;
        for seqrec in multiple_alignment:
                if seqrec.id.startswith("tupBel1"): # mm4 tupBel1 tanvir
                    # don't count gaps as bases
                    total_bases_tupBel1 += len( str(seqrec.seq).replace("-", "") )

        print "total of %s base align" %total_bases_tupBel1 + " %s"   %total_bases_tupBel1
1. mafsInRegion
=====================================

desc: Find multi alignment output in different species

usage: ./mafsInRegion -keepInitialGaps   -outDir test.bed out_dir chr22.maf

precaution:

1. use -outDir if you want gene name specific output in seperate file (1 file per 1 sample).

2. liftOver
=====================================

desc: Lift co-ordinate of one assembly over another assembly. Give output according to similarity match.

usage: ./liftOver old.chain new.chain mapped.out unMapped.out

precaution:

for single species use similaritythreshold 0.95
for multiple species use similarity thr 0.10
It is not optimized for multiple species

3. match@ TRANSFAC professional
=====================================

desc: Find hit location of matrix over sequence

usage: ./match matrix.dat promoter.seq map.out matrix.profile

precaution:

if a matrix's profile is not available in profile file, this matrix will not be mapped over sequence.

4.OrthoMCL
==================================

desc: Find orthologous gene from >=2 species

installation:

a. download the unsupported version. If you download full version you have to install mysql and others.

http://orthomcl.org/common/downloads/software/unsupported/v1.4/

There are 2 zip files. One for OrthoMCL and second one for MCL.

b. compile MCL: (Read the install file for help)
      unzip
      ./configure
      make
      make install
c. change the path name in OrthoMCL folder file orthomcl_module.pm

for variable $BLASTALL , $FORMATDB , $MCL, $$PATH_TO_ORTHOMCL, $ORTHOMCL_DATA_DIR

d. Keep all the fasta file in $ORTHOMCL_DATA_DIR folder.

usage: (See README under OrthoMCLv1.4)

perl orthomcl.pl --mode 1 --fa_files Ath.fa,Hsa.fab

precaution

5. blastp
==========================

blastp -evalue 1e-05 -query testBee.prot.fa -subject testBee.prot.fa -outfmt 6 -out myout.txt

6. apriori algorithm
===========================
// min support 90% , maximal subset , unlimited #item in a itemset.
apriori.exe -tm -s90 gene_tf.input gene_tf_90.out

My Blog

Search This Blog

Bioinformatics tool

Comments

Post a Comment

Popular posts from this blog

MATLAB cross validation

MATLAB optimization toolbox usage with genetic algorithm

SLURM tutorial : Basic commands