Wednesday, July 25, 2012

Bioinformatics tool


0. BIOPYTHON mafIO
==================

    def mafParsing(self,chrName ):
        idx = MafIndex(chrName + ".mafindex", chrName + ".maf", "hg19." + chrName )
        multiple_alignment = idx.get_spliced([60000] ,  [60020] , strand = "-")      
        AlignIO.write(multiple_alignment, "hg19_All.fa", "fasta")
        total_bases_tupBel1  = 0;
        for seqrec in multiple_alignment:
                if seqrec.id.startswith("tupBel1"): # mm4  tupBel1  tanvir
                    # don't count gaps as bases
                    total_bases_tupBel1 += len( str(seqrec.seq).replace("-", "") )
                  
        print "total of %s base align" %total_bases_tupBel1  + " %s"   %total_bases_tupBel1

1. mafsInRegion
=====================================

desc:  Find multi alignment output in different species

usage:  ./mafsInRegion -keepInitialGaps   -outDir test.bed out_dir chr22.maf


precaution:

1. use -outDir if you want gene name specific output in seperate file (1 file per 1 sample).




2. liftOver
=====================================

desc: Lift  co-ordinate of one assembly over another assembly. Give output according to similarity match.


usage: ./liftOver old.chain new.chain mapped.out unMapped.out


precaution:

for single species use similaritythreshold 0.95
for multiple species use similarity thr 0.10
It is not optimized for multiple species


3. match@ TRANSFAC professional
=====================================

desc: Find hit location of matrix over sequence


usage: ./match  matrix.dat promoter.seq  map.out matrix.profile


precaution:

if a matrix's profile is not available in profile file, this matrix will not be mapped over sequence.

4.OrthoMCL
==================================

desc: Find orthologous gene from >=2 species

installation:

a.  download  the unsupported version. If you download full version you have to install mysql and others.

http://orthomcl.org/common/downloads/software/unsupported/v1.4/

There are 2 zip files. One for OrthoMCL and second one for MCL.

b. compile MCL: (Read the install file for help)
      unzip
      ./configure
      make
      make install
c. change the path name in OrthoMCL folder file orthomcl_module.pm

for variable $BLASTALL , $FORMATDB , $MCL, $$PATH_TO_ORTHOMCL, $ORTHOMCL_DATA_DIR

d.  Keep all the fasta file in $ORTHOMCL_DATA_DIR  folder.

usage: (See README under OrthoMCLv1.4)

perl orthomcl.pl --mode 1 --fa_files Ath.fa,Hsa.fab


precaution


5. blastp
==========================

blastp -evalue 1e-05 -query testBee.prot.fa -subject testBee.prot.fa -outfmt 6 -out myout.txt

6. apriori algorithm
===========================
// min support 90% , maximal subset , unlimited #item in a itemset.
apriori.exe -tm -s90  gene_tf.input  gene_tf_90.out








Monday, July 16, 2012

java binary search





return the match index /  the next index if no match

int findDownstreamStart( int key)
    {
        int imax = vectNegStart.size();
        int imin = 0;
        int imid = (imin + imax) / 2;
       
        if( key <= vectNegStart.get(0))
            return 0;
        if( key >= vectNegStart.get(imax-1 ))
            return imax-1;
       
   
        // continue searching while [imin,imax] is not empty
        while (imax >= imin)
        {
            /* calculate the midpoint for roughly equal partition */
            imid = (imin + imax) / 2;

            // determine which subarray to search
            if      ( vectNegStart.get(imid) <  key)
                // change min index to search upper subarray
                imin = imid + 1;
            else if (  vectNegStart.get(imid) > key )
                // change max index to search lower subarray
                imax = imid - 1;
            else
                // key found at index imid
                return imid;
        }
       
        return imid;

    }

Sunday, July 15, 2012

java linkedhashmap linked hashmap



// create map

LinkedHashMap hashmapGene = new LinkedHashMap()();






// insert element into map

            if( hashmapGene.containsKey(geneID) )
            {               
                GeneInfo gene = hashmapGene.get(geneID);
                gene.determineClass(tmp[3]);               

            }else
            {
                hashmapGene.put( geneID , new GeneInfo(chrom , st,end, promStartCoord, promEndCoord ,
                        tmp[3],tmp[4],strand ,tmp[6],tmp[7],tmp[8],tmp[9],tmp[10],tmp[11] ,
                        strLine  ) );
            }
     

// retrieve element from map
    
            Set set = hashmapGene.entrySet();
            System.out.println("Total Unique entry:" + set.size() ) ;
            Iterator itr = set.iterator();
            while(itr.hasNext()){


                Map.Entry me = (Map.Entry) itr.next();
                String id = (String)me.getKey();
                GeneInfo geneObj = (GeneInfo) me.getValue();
               
            }