1. Repository for annotation of GO ID against genes
download file (gene_association.goa_human.gz) from link:
http://geneontology.org/gene-associations/gene_association.goa_human.gz
http://geneontology.org/page/download-annotations
2. How to extract GO names/description from database from GO ID
Option 1 : use the file
http://purl.obolibrary.org/obo/go.obo
from http://geneontology.org/page/download-ontology
Option 2 : use R package
# load the GO library
library(GO.db)
write.table( goterms, sep="\t", file="goterms.txt")
3. How to do get all GO terms for a gene
Option 1: Use the annotation database [BEST option you can dowload updated file and parse it based on UNIPROT ACC No ]
http://geneontology.org/gene-associations/gene_association.goa_human.gz
Option 2 : R package Ontologyzer [It can be used to show annotation at child and parent level also , scrool down below to check command]
http://compbio.charite.de/Option 3 : Using R package biomaRt [Very slow]
library(biomaRt)
databases=listMarts();
myDB = useMart("unimart") % "ensembl"
myDataset=listDatasets(myDB) % "hsapiens_gene_ensembl"
# for uniprot
martUni = useMart(biomart = "unimart", dataset = "uniprot")
results = getBM(attributes = c("go_id","go_name"), filters = "accession", values= c("P49023"), mart = mart)
# for ensemble
# martEn = useMart(biomart = "ensembl", dataset = "hsapiens_gene_ensembl")
# attr = listAttributes(mart)
# filters = listFilters(mart)
results = getBM(attributes = c("go_id","name_1006"), filters = "refseq_mrna",values = c("NM_030621"), mart = mart)
4. How to do GO term enrichment analysis
Option 1 : R package Ontologyzer [Best option]
http://compbio.charite.de/Works with any ID, I tested with UNIPROT ACCESSION no.
sample commands
java -jar Ontologizer.jar -a gene_association.goa_human -g go.obo -s input.txt -p background.txt -c Parent-Child-Union -m Bonferroni -d 0.05 -r 1000 -n -o outputfolder
java -jar Ontologizer.jar -a gene_association.goa_human -g go.obo -s input.txt -p background.txt -c Parent-Child-Union -m Westfall-Young-Single-Step -d 0.05 -r 1000 -n -o outputfolder
Option 2 : Using David Web Service Client:
http://david.abcc.ncifcrf.gov/
Just change following 3 variables:
String inputIds= new String("paxi_human,git1_human" );
String inputIds= new String("paxi_human,git1_human"
String idType = new String("UNIPROT_ID");
String listName = new String("make_up");
int listType = 0;
double addListOutput =0;
//Set user defined categories
// String category_names = new String("BBID,BIOCARTA,COG_ ONTOLOGY,INTERPRO,KEGG_ PATHWAY,OMIM_DISEASE,PIR_ SUPERFAMILY,SMART,SP_PIR_ KEYWORDS,UP_SEQ_FEATURE");
String category_names = new String("GOTERM_BP_FAT,GOTERM_ CC_FAT,GOTERM_MF_FAT");
API for Parameters
http://david.abcc.ncifcrf.gov/
Option 3 : R package GOFunction ( Only takes entrez ID)
http://bioconductor.org/
source("http://bioconductor.org/biocLite.R")
biocLite("GOFunction")
biocLite("org.Hs.eg.db")
biocLite("graph")
biocLite("Rgraphviz")
biocLite("SparseM")
library(GOFunction)
myDir="D:/research/GOAnalysis/";
setwd(myDir);
data(exampledata)
# Only takes entrez ID
myIntetest=interestGenes[1:20]
myRef=refGenes[1:200]
sigTermBP = GOFunction(myIntetest, myRef, organism="org.Hs.eg.db",
ontology="BP", fdrmethod="BY", fdrth=0.05, ppth=0.05, pcth=0.05, poth=0.05, peth=0.05, bmpSize=2000, filename="sigTermBP")
ontology="BP", fdrmethod="BY", fdrth=0.05, ppth=0.05, pcth=0.05, poth=0.05, peth=0.05, bmpSize=2000, filename="sigTermBP")
sigTermMF = GOFunction(myIntetest, myRef, organism="org.Hs.eg.db",
ontology="MF", fdrmethod="BY", fdrth=0.05, ppth=0.05, pcth=0.05, poth=0.05, peth=0.05, bmpSize=2000, filename="sigTermMF")
5. How to calculate p-values for GO terms related to a gene
1.Wilcoxon Rank Sum and Signed Rank Tests:
https://stat.ethz.ch/R-manual/R-patched/library/stats/html/wilcox.test.html
2. Find the confidence interval using bootstrap
fsimple = function(curdata, i){
return(curdata[i])
}
mydata=rnorm(100,0,1)
bootsimple = boot(mydata, fsimple , R=500)
plot(bootsimple )
simpleCI=boot.ci( bootsimple , conf=0.90 , type = "all")
simpleCI$normal
3. Find the cdf and calculate threshold (1-pvalue):
# Samples must be sorted ( e.g. 2,2,2, 3,3, 4,4,4,55 )
mysample=c( 2,2,2, 3,3, 4,4,4,5,5 )
mydf = data.frame( mySamples=mysample )
myCDF = ( 1: length(mydf$mySamples ))/length( mydf$mySamples )
# select minimum of sample value which match threshod
myThr=0.95
mySampleAtThr = mysample[idOfMatchSamples[1]]
Other options : Must need to check
- gProfileR: http://cran.r-project.org/web/packages/gProfileR/gProfileR.pdf
- GoProfiles : http://bioconductor.org/
packages/release/bioc/ vignettes/goProfiles/inst/doc/ goProfiles.pdf - TopGO: http://www.bioconductor.org/
packages/release/bioc/html/ topGO.html - GOstat:http://bioconductor.org/
packages/release/bioc/ vignettes/GOstats/inst/doc/ GOstatsForUnsupportedOrganisms .pdf
Which file to choose for annotation:
from http://geneontology.org/page/download-annotations
Option 1: gene_association.goa_ref_human
These files contains all GO annotations and protein information for a species
1. subset of proteins in the UniProt KnowledgeBase (UniProtKB) and
2. provide one protein per gene.
3. The protein accessions included in these files are the protein sequences annotated in Swiss-Prot or the longest TrEMBL transcript if there is no Swiss-Prot record.
4. If a particularprotein accession is not annotated with GO, then it will not appear in this file.
Option 2: gene_association.goa_human
These files contains all GO annotations and information for a species
1. subset of proteins in the UniProt KnowledgeBase (UniProtKB) and for entities other than proteins, e.g. macromolecular complexes (IntAct Complex Portal identifiers) or RNAs (RNAcentral identifiers).
2. These files may provide annotations to more than one protein per gene.
3. The protein accessions included in these files are all Swiss-Prot entries for that species plus any TrEMBL entries that have an Ensembl DR line. The TrEMBL entries are likely to overlap with the Swiss-Prot entries or their isoforms.
4. If a particular entity is not annotated with GO, then it will not appear in this file.
Comments
Post a Comment