Wednesday, August 29, 2012

C/C++ header file inlcude "" OR<> how to use std ?



Suppose your project current directory is

0.   /home/myproj/

Let assume include foder are


1. /home/include1/
2. /home/include2/

==>  # include "myhead.h"
   search for directory in order 0 , then 1 , then 2

AS IT SEARCH CURRENT DIR FIRST, USUALLY user defined headers used in this way.

==>  # include
   search for directory in order 1 , then 2 , then 0

AS IT SEARCH INCLUDE DIR FIRST,  standard headers used in this way.

Header file extension:
==================
In C : In early days it was .hxx, .hpp . Then it was .h

    #include myheader.hxx
    #include myheader.hp
    #include myheader.hpp

In C++ : No extension. They are included in namespaces.

   #include
   using namespace std; // use original namespace

you can also use it in code as std::stringfunction

Tuesday, August 28, 2012

linux configuration file : bash_profile / bash_logout / bashrc


see also : http://www.troubleshooters.com/linux/prepostpath.htm

source : http://www.hypexr.org/bash_tutorial.php#whatis

In your home directory, 3 files have a special meaning to Bash, allowing you to set up your environment automatically when you log in and when you invoke another Bash shell, and allow you to execute commands when you log out.
These files may exist in your home directory, but that depends largely on the Linux distro you're using and how your sysadmin (if not you) has set up your account. If they're missing, Bash defaults to /etc/profile.
You can easily create these files yourself using your favorite texteditor. They are:
  • .bash_profile : read and the commands in it executed by Bash every time you log in to the system
  • .bashrc : read and executed by Bash every time you start a subshell
  • .bash_logout : read and executed by Bash every time a login shell exits
Bash allows 2 synonyms for .bash_profile : .bash_login and .profile. These are derived from the C shell's file named .login and from the Bourne shell and Korn shell files named .profile. Only one of these files is read when you log in. If .bash_profile isn't there, Bash will look for .bash_login. If that is missing too, it will look for .profile.
.bash_profile is read and executed only when you start a login shell (that is, when you log in to the system). If you start a subshell (a new shell) by typing bash at the command prompt, it will read commands from .bashrc. This allows you to separate commands needed at login from those needed when invoking a subshell.
However, most people want to have the same commands run regardless of whether it is a login shell or a subshell. This can be done by using the source command from within .bash_profile to execute .bashrc. You would then simply place all the commands in .bashrc.



Example .bash_profile
==================

# .bash_profile
if [ -f ~/.bashrc ]; then
    . ~/.bashrc
fi


Example .bashrc (Now add all your installation path in .bashrc)
============================================
 
# .bashrc
# _LIBRARY PATHs
PYTHONPATH=/home/tanviralam/kaustmachine/basedir/Python-2.7.6/

PATH=$PATH:$PYTHONPATH
export PATH


Sunday, August 26, 2012

Sun Grid Engine : SGE : Basic Commands


http://www.ats.ucla.edu/clusters/common/computing/batch/sge.htm

http://web.njit.edu/topics/HPC/basement/sge/SGE.html

http://star.mit.edu/cluster/docs/0.92rc2/guides/sge.html




0. First see available q : qstat -g c

1. ALWAYS USE -q queuename

2. To use any particular node use -l hostname (  -l hostname=himem03 )



qsub   -q  normal.q   -e pythonTeswt.err -o pythonTeswt.out   -cwd -S /bin/bash   -l hostname=himem03  testSingledataprep.sh


## using specific node

qsub   -q  normal.q   -e pythonTeswt.err -o pythonTeswt.out   -cwd -l hostname=himem03  -S /bin/bash   -l hostname=himem03  testSingledataprep.sh


python mafParser.py $mafRoot  $dataRoot  $fNameSpecies $fNameSubEXONLncrna  $fnmStatCount $fnmStatPrcnt $fnmStatSub $fnmSUBEXONStatDETECT  $fnapslncBed

/// check my submitted jobs

qstat
// alluser

qstat -u "*"

///// all details of user and nodes

userstat

/// details of queues

qstat -g c

// delete jobs

qdel jobID

// node informations
qhost

--------  RUN any binary installed in path -----------------------

qsub -q  normal.q -cwd -b y -o 1.out -e 1.err -q normal.q -N job1 binary
------------------  JAVA  -------------------------
#!/bin/sh
qsub -q  normal.q -e myErr.err -o myOut.out -S  /usr/bin/java Test.class

-------------------- PYTHON ----------------
#!/bin/sh
qsub -q  normal.q   -e myErr.err -o myOut.out  -s /usr/bin/python  test.py

--------------------  To run in cluster using several nodes ------------
-pe smp 32
// it will try to assign 32 nodes for this job

Saturday, August 25, 2012

code p-value calculation


  pvalue/p-value calculation

HYPERGEOMETRIC TEST


http://mengnote.blogspot.com/2012/12/calculate-correct-hypergeometric-p.html

# x, q     vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.
# m     the number of white balls in the urn.
# n     the number of black balls in the urn.
# k     the number of balls drawn from the urn.
# phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

sucessDrawn=1
sucessTotal=2

sampleSize=24
populatonSizeWithoutSample=1751-sampleSize

phyper(sucessDrawn-1, sucessTotal, populatonSizeWithoutSample, sampleSize, lower.tail = FALSE, log.p = FALSE)

FISHER EXACT TEST


Bonferroni correction: Pvalue* (#of test cases)

http://www.une.edu.au/WebStat/unit_materials/c7_anova/oneway_bonferroni_adjust.htm

http://www.langsrud.com/stat/fisher.htm

Code done by  Benoit Marchand

/*
 * pvalcalc.cpp
 *      Author: 
Benoit Marchand
 */
#include
#include


    /*
     *                 CLASSA    CLASSB
     *                        -----------------
     *  PRESENT    |      |        |    TOTALpresent= n
     *  ABSENT      |          |        |    TOTALabsent
     *                         ------------------------------------------------
     *     K= TOTALclassA|        |  N = TOTALclass A + B
     * 

/*
 *  k - #PRESENT in CLASSA
 *  n - #PRESENT in CLASSA + CLASSB
 *
 *  K - #TOTAL sample in CLASSA
 *  N - #TOTAL sample in CLASSA + CLASSB
 */
double calcOneMotifPValue(int k,int n,int K,int N)
{
        double pv,t1,x;

/*
        k      n-k         |  n
        K-k    (N-K)-(n-k) |  N-n
        -------------------------
        K      N-K         |  N


        This calculates the Right Hand Side PValue.
        (LHS would be k=0,n.)
*/
        t1=lgamma(((float) K+1))+lgamma(((float) N-K+1))-lgamma(((float) N+1))+lgamma(((float) n+1))+lgamma(((float) N-n+1));
        for (pv=-1e100;k<=n && k<=K && (n-k)<=(N-K);k++) {
                x=t1-lgamma(((float) k+1))-lgamma(((float) K-k+1))-lgamma(((float) n-k+1))-lgamma(((float) N-K-n+k+1));
                pv=(pv>=x?pv+log(1.0+exp(x-pv)):x+log(1.0+exp(pv-x)));
        }

        // Bonferroni correction (pval * nSEQF)
        //rankDB[id].pvalue=((float) rankDB[id].nSEQF)*exp(pv);
        //rankDB[id].pvalue=exp(pv);
        return(exp(pv));
}


int main()
{
    printf("%.8f", calcOneMotifPValue( 75,100, 155, 200) );
}



////////////  JAVA VERSION 
import org.apache.commons.math.special.Gamma;

/*
     *                 CLASSA    CLASSB
     *                        -----------------
     *  PRESENT    |    k    |        |    TOTALpresent=n
     *  ABSENT      |          |        |    TOTALabsent
     *                         -------------------------------
     *         TOTALclassA  |        |      TOTALclass A + B
     *                 K                                    N
     */

    /*
     *  k - #PRESENT in CLASSA
     *  n - #PRESENT in CLASSA + CLASSB
     *
     *  K - #TOTAL sample in CLASSA
     *  N - #TOTAL sample in CLASSA + CLASSB
     */
    public static double calcRightsidePValue(int hitA,int hitAB,int totA,int totAB)
    {
            double pv,t1,x;

    /*
            k      n-k         |  n
            K-k    (N-K)-(n-k) |  N-n
            -------------------------
            K      N-K         |  N


            This calculates the Right Hand Side PValue.
            (LHS would be k=0,n.)
    */
            t1= Gamma.logGamma(((float) totA+1))+Gamma.logGamma(((float) totAB-totA+1))-Gamma.logGamma(((float) totAB+1))+Gamma.logGamma(((float) hitAB+1))+Gamma.logGamma(((float) totAB-hitAB+1));
            for (pv=-1e100;hitA<=hitAB && hitA<=totA && (hitAB-hitA)<=(totAB-totA);hitA++) {
                    x=t1-Gamma.logGamma(((float) hitA+1))-Gamma.logGamma(((float) totA-hitA+1))-Gamma.logGamma(((float) hitAB-hitA+1))-Gamma.logGamma(((float) totAB-totA-hitAB+hitA+1));
                   
                    pv=( pv>=x  ? pv+  Math.log(1.0+Math.exp(x-pv))  :  x+ Math.log(1.0+ Math.exp(pv-x)));
            }

            // Bonferroni correction (pval * nSEQF)
            //rankDB[id].pvalue=((float) rankDB[id].nSEQF)*exp(pv);
            //rankDB[id].pvalue=exp(pv);
            return( Math.exp(pv));
    }


Fisher 's exact test in R package: stats
===========================
myContin = matrix(c(1,4,7,4), nrow = 2);
fisher.test(myContin ,alternative = "two.sided" ) ; // two sided

fisher.test(myContin ,alternative = "greater" ) ; // right-hand sided
fisher.test(myContin ,alternative = "less" ) ; // left-hand sided

Tuesday, August 21, 2012

Python tutorial




How to install different packages:


In MAC, use pip

pip install -U numpy scipy scikit-learn

How to see where the package is installed


In MAC: pip show numpy


Import all

============

import className
OR
from package import className


Code Structure
========================

1.  just write script one line by one

print "ab"
print "cd"

2. Use function (Here you must need a main function):

def myfunc():
    print "Inside fnc ab"

if __name__ == '__main__':
    pass
    myfunc()
    objUtil = Utilities.printString("Hello world")

3. Use like java, class + function +main


myclass.py
--------------------

class LncRNA(object): 
    def __init__(self,chr , chrStart , chrEnd):
        self.chr = chr
        self.chrStart = chrStart
        self.chrEnd = chrEnd
       
    def classfunction():
        sOut='';
        sOut+= str(self.chr) + '\t' + str(self.chrStart) + '\t' + str(self.chrEnd);
        return sOut;

caller.py
--------------
def myfunc():
    print "Inside fnc ab"

if __name__ == '__main__':
    pass
    myfunc()
    obj = LncRNA("chr1" , 120 , 230)
    obj.classfunction()

 Like java : Inner class + function +
-----------------------------------------

class myinnter:
          def __init__(self,larray):                self.array = larray
           def fncinner:
                print "hi inner"

def myfnc:
      print "inside fnc"

if __name__ == '__main__':
    myfunc()
    obj = myinner;
   obj.fncinner


Return value  No need to write return in function definition
==============================

def myret(): 
 return val;


print
============
( like java , strins can be added by +)
( like c  BUT use % instead of , "%d" %i)
============================

print "#lncrna in %s" %chrName + " is: %d" %chrmWiseLNCRNAList.__len__()

==============================================
Array

Completely different from C/Java. It is like list. But only 1 type can be saved 
===============================================

// First way to create array like list
myArr = array('L')
myArr.append(54)
myArr.append(54)
print myArr[i]

// second way to create array like list
arrSize = 5;
myArr = [ 0 for val in range(arrSize)]
print myArr[i]


def arrayProcess():
    noRow = 2
    noCol = 4;
    table= [
            [ 0 for j in range(noCol) ]
                for i in range(noRow)
            ]
    print table
    print 'DONE'
    inc = 1
    for i in range(noRow):
        for j in range(noCol):
            table[i][j]= inc
            inc +=1
           
    for d1 in range(noRow):
        for d2 in range(noCol):
            print str(table[d1][d2]) + '\t',          
        print '\n',

for loop increment or step by 2 other than 1
==============================

for i in range( start, end , inc):
    print "Processing: " + i

for loop indexing x[ i : j] ==> from i to (j-1)
output for both case : 0 1 2
===========================
for i in range(0 ,3 ,1):
    print i,

a = []
a.append(0);a.append(1);a.append(2);a.append(3);
for i in a[0:3]:
    print i,


Basic string operation
=================

def stringProcess():
    s = "Hello World"
    myfields = s.split()
    for field in myfields:
        print field
    print "Str len is:" + str( s.__len__() )

    ####  capital 1st letter ####

    print s.capitalize() 
    ##### index find ####

    print "position of ello is: " + str( s.find("ello") )

    ##### substing ####
    print "substring: " + s[ s.find("ello"): 5]

   #### compare####
    if str(chrName).lower().__eq__('chr1') :
            return 0;
     else:
            return 1;

  ######  logical negation #######

   if not str(chr).lower().__eq__(chrmName ):
            continue

    #### concat####
    s = ''
    print s + " how are you?" # CONCAT

    #### replace ####
    newString= s.replace("H", "M")
    print newString
    print s

    ##### trim / strip ####
    print s.strip()
    # check prefix & postfix
    print s.startswith("He")
    print s.endswith("world")


L1. Create class
==============


class Motif(object):
    staticVar=5
    @staticmethod
    def staticFnc():
        print 555

    def __init__(self, id, name , value):
        self.id = id
        self.name = name
        self.value = value
   
    def __str__(self, *args, **kwargs):
        sOut='';
        sOut+= str(self.id) + '\t' + self.name + '\t' + str(self.value);
        return sOut;   
   
    def update(self,inc):
        self.value = self.value + inc


**** calling *******

from com.cbrc.bean.Motif import Motif

obj = Motif(1,'tanvir',10)
obj.update(5)
Motif.staticFnc()
print Motif.staticVar

obj.update(99)
print obj.value

print obj

2. Read from file
==============
from scipy.io.fopen import fopen

def readIt( fnameIn,fnameOut):  
    fidOut= open(fnameOut , "w"); 
    fid = open(fnameIn , "r");
    for curLine in fid:
        fidOut.write(curLine);
      
    fid.close();
    fidOut.close();

3. String split with regular expression
========================

import re

myPat= re.compile("[\s]+")
myFields = myPat.split(curLine )
        for col in myFields:
            print col


4. List of tuple
===============

def tupleProcess( fnameIn,fnameOut):
    myPat= re.compile("[\s]+")
    myList = list() # TUPLE OBJECT DECLARATION
   
    fidOut= open(fnameOut , "w");  
    fid = open(fnameIn , "r");
    for curLine in fid:
        myFields = myPat.split(curLine );
        chr = myFields[0]
        chrStart = int(myFields[1])
        chrEnd = int(myFields[2])
        myBed = (chr, chrStart, chrEnd) # tuple formation
        myList.append(myBed)
       
    # write data of LIST
    print myList.__len__()
    for curBed in myList:
        fidOut.write( curBed[0] + '\t' + str(curBed[1]) )
        fidOut.write("\n") 
  fid.close();

  # write buffer data to File
def writeFileFromBuffer( fnmOut, myBuffer):

    target = open(fnmOut ,"w")
    target.write(myBuffer + '' )
    target.close()


4 Vector of Objects
=================


class MyClass(object):
    def __init__(self, number):
        self.number = number
if __name__ == "__main__":
   my_objects = []

   for i in range(5):
      my_objects.append(MyClass(i))

   for obj in my_objects:
      print obj.number
==========================
5. Hashtable / Dictionary

# For un-ordered dictionary / oedered hashmap
myHash = dict()
myHash['mykey'] = val
print myHash['mykey']

# For ordered dictionary / oedered hashmap
import collections
dictLinesNR = collections.OrderedDict();

==========================

class LncRNA(object): 
    def __init__(self,chr , chrStart , chrEnd):
        self.chr = chr
        self.chrStart = chrStart
        self.chrEnd = chrEnd
       
    def __str__(self, *args, **kwargs):
        sOut='';
        sOut+= str(self.chr) + '\t' + str(self.chrStart) + '\t' + str(self.chrEnd);
        return sOut;

   def __getAnyValue__(self, *args, **kwargs):
    return self.chrStart;


def dictionaryProcess( fnameIn,fnameOut):
    myPat= re.compile("[\s]+")
    myDict = dict() 
    fidOut= open(fnameOut , "w");  
    fid = open(fnameIn , "r");
    for curLine in fid:
        myFields = myPat.split(curLine );
        chr = myFields[0]
        chrStart = int(myFields[1])
        chrEnd = int(myFields[2])
        myRNA = LncRNA(chr, chrStart, chrEnd) # tuple formation
        print myRNA
        if myDict.__contains__(chr):
          print "Already exist: " + chr
    else:
                myDict[chr] =   myRNA      # INSERT NEW ELEM   
    # write data of DICTIONARY
    print myDict.__len__()
    for k, v in myDict.iteritems():
        fidOut.write( k + '\t+ v.__getAnyValue__()+ '\t' + v.__str__() )
     
        fidOut.write("\n")

   fid.close();

============
6. SET
mySet = set()
mySet.add(elem)
============

def setProcess( fnameIn,fnameOut):
    myPat= re.compile("[\s]+")
    mySet = set() 
    fidOut= open(fnameOut , "w");  
    fid = open(fnameIn , "r");
    for curLine in fid:
        myFields = myPat.split(curLine );
        chr = myFields[0]
        chrStart = int(myFields[1])
        chrEnd = int(myFields[2])
        mySet.add(chr)  # INSERT NEW ELEM    

   fid.close();

    mySet2 = set([ 'chr2', 'chrX']);   
    # write data of SET
    print mySet.__len__()
    for elem in mySet:
        fidOut.write( elem )
        fidOut.write("\n")
    fidOut.close();

    sortedSet = sorted ( mySet , key= int )  
    uni = mySet | mySet2
    intrsct = mySet & mySet2
    diff = mySet - mySet2



2D Array
==============

Creation and initialize with 0:

myMat = [[0 for x in range( xDimention)] for y in range(yDimension)]

Write
def writeMatrix2D(myMat,  myseperator, outFile):
    print "writing matrix of count"    xSize = len(myMat)
    ySize = len(myMat[0])

    mybuffer = ''    for d1 in range(xSize):
        for d2 in range(ySize):
            if d2== (ySize-1):
                mybuffer += str(myMat[d1][d2])
            else:
                mybuffer += str(myMat[d1][d2]) + myseperator
        mybuffer += "\n"    writeFileFromBuffer(outFile, mybuffer)


Write buffer content to file

def writeFileFromBuffer( fnmOut, myBuffer):

    target = open(fnmOut ,"w")
    target.write(myBuffer + '' )
    target.close()



NUMPY OPERATIONS



Matrix operations


Matrix Creation
matrixNP1 = numpy.matrix(  numpy.zeros(shape=(totState , totHour)  , dtype = float) , dtype=float )

#Sum of Column
sumColumn = matrixNP1.sum(axis=0)

# Sum of Row
sumRow = matrixNP1.sum(axis=1)


# Divide each column by a vector ( e.g. sum of column)
matrixNP1 = matrixNP1/ matrixNP1.sum(axis=1)


# Divide each element  by a scalar
matrixNP1 = matrixNP1/1




Exception Handling


http://stackoverflow.com/questions/4990718/python-about-catching-any-exception



import sys
try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:  # handle all exceptions
    print "Unexpected error:", sys.exc_info()[0]
    raise




Thursday, August 2, 2012

linux installation common command


1. ./configure

2. make

3. make check

4. make install