Skip to main content

code p-value calculation


  pvalue/p-value calculation

HYPERGEOMETRIC TEST


http://mengnote.blogspot.com/2012/12/calculate-correct-hypergeometric-p.html

# x, q     vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.
# m     the number of white balls in the urn.
# n     the number of black balls in the urn.
# k     the number of balls drawn from the urn.
# phyper(q, m, n, k, lower.tail = TRUE, log.p = FALSE)

sucessDrawn=1
sucessTotal=2

sampleSize=24
populatonSizeWithoutSample=1751-sampleSize

phyper(sucessDrawn-1, sucessTotal, populatonSizeWithoutSample, sampleSize, lower.tail = FALSE, log.p = FALSE)

FISHER EXACT TEST


Bonferroni correction: Pvalue* (#of test cases)

http://www.une.edu.au/WebStat/unit_materials/c7_anova/oneway_bonferroni_adjust.htm

http://www.langsrud.com/stat/fisher.htm

Code done by  Benoit Marchand

/*
 * pvalcalc.cpp
 *      Author: 
Benoit Marchand
 */
#include
#include


    /*
     *                 CLASSA    CLASSB
     *                        -----------------
     *  PRESENT    |      |        |    TOTALpresent= n
     *  ABSENT      |          |        |    TOTALabsent
     *                         ------------------------------------------------
     *     K= TOTALclassA|        |  N = TOTALclass A + B
     * 

/*
 *  k - #PRESENT in CLASSA
 *  n - #PRESENT in CLASSA + CLASSB
 *
 *  K - #TOTAL sample in CLASSA
 *  N - #TOTAL sample in CLASSA + CLASSB
 */
double calcOneMotifPValue(int k,int n,int K,int N)
{
        double pv,t1,x;

/*
        k      n-k         |  n
        K-k    (N-K)-(n-k) |  N-n
        -------------------------
        K      N-K         |  N


        This calculates the Right Hand Side PValue.
        (LHS would be k=0,n.)
*/
        t1=lgamma(((float) K+1))+lgamma(((float) N-K+1))-lgamma(((float) N+1))+lgamma(((float) n+1))+lgamma(((float) N-n+1));
        for (pv=-1e100;k<=n && k<=K && (n-k)<=(N-K);k++) {
                x=t1-lgamma(((float) k+1))-lgamma(((float) K-k+1))-lgamma(((float) n-k+1))-lgamma(((float) N-K-n+k+1));
                pv=(pv>=x?pv+log(1.0+exp(x-pv)):x+log(1.0+exp(pv-x)));
        }

        // Bonferroni correction (pval * nSEQF)
        //rankDB[id].pvalue=((float) rankDB[id].nSEQF)*exp(pv);
        //rankDB[id].pvalue=exp(pv);
        return(exp(pv));
}


int main()
{
    printf("%.8f", calcOneMotifPValue( 75,100, 155, 200) );
}



////////////  JAVA VERSION 
import org.apache.commons.math.special.Gamma;

/*
     *                 CLASSA    CLASSB
     *                        -----------------
     *  PRESENT    |    k    |        |    TOTALpresent=n
     *  ABSENT      |          |        |    TOTALabsent
     *                         -------------------------------
     *         TOTALclassA  |        |      TOTALclass A + B
     *                 K                                    N
     */

    /*
     *  k - #PRESENT in CLASSA
     *  n - #PRESENT in CLASSA + CLASSB
     *
     *  K - #TOTAL sample in CLASSA
     *  N - #TOTAL sample in CLASSA + CLASSB
     */
    public static double calcRightsidePValue(int hitA,int hitAB,int totA,int totAB)
    {
            double pv,t1,x;

    /*
            k      n-k         |  n
            K-k    (N-K)-(n-k) |  N-n
            -------------------------
            K      N-K         |  N


            This calculates the Right Hand Side PValue.
            (LHS would be k=0,n.)
    */
            t1= Gamma.logGamma(((float) totA+1))+Gamma.logGamma(((float) totAB-totA+1))-Gamma.logGamma(((float) totAB+1))+Gamma.logGamma(((float) hitAB+1))+Gamma.logGamma(((float) totAB-hitAB+1));
            for (pv=-1e100;hitA<=hitAB && hitA<=totA && (hitAB-hitA)<=(totAB-totA);hitA++) {
                    x=t1-Gamma.logGamma(((float) hitA+1))-Gamma.logGamma(((float) totA-hitA+1))-Gamma.logGamma(((float) hitAB-hitA+1))-Gamma.logGamma(((float) totAB-totA-hitAB+hitA+1));
                   
                    pv=( pv>=x  ? pv+  Math.log(1.0+Math.exp(x-pv))  :  x+ Math.log(1.0+ Math.exp(pv-x)));
            }

            // Bonferroni correction (pval * nSEQF)
            //rankDB[id].pvalue=((float) rankDB[id].nSEQF)*exp(pv);
            //rankDB[id].pvalue=exp(pv);
            return( Math.exp(pv));
    }


Fisher 's exact test in R package: stats
===========================
myContin = matrix(c(1,4,7,4), nrow = 2);
fisher.test(myContin ,alternative = "two.sided" ) ; // two sided

fisher.test(myContin ,alternative = "greater" ) ; // right-hand sided
fisher.test(myContin ,alternative = "less" ) ; // left-hand sided

Comments

Popular posts from this blog

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partition     selectedKfoldSen=[];selectedKfoldSpe=[];selectedKfoldAcc=[];     indexSen=1;indexSpe=1;indexAcc=1;     if ( kfold == (P+N) )% leave one out         sensitivity = sum(cvtp) /( sum(cvtp) + sum(cvfn) )         specificity = sum(cvtn) /( sum(cvfp) + sum(cvtn) )         acc

R tutorial

Install R in linux ============ In CRAN home page, the latest version is not available. So, in fedora, Open the terminal yum list R  --> To check the latest available version of r yum install R --> install R version yum update R --> update current version to latest one 0 find help ============ ?exact topic name (  i.e.   ?mean ) 0.0 INSTALL 3rd party package  ==================== install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package BED file parsing (Always use read.delim it is the best) library(MASS) #library(ggplot2) dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/" dirData="D:/research/F5shortRNA/TestRIKEN/" setwd(dirRoot) getwd() myBed="test.bed" fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed ## Read bed use read.delim - it is the  best mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = &q