Wednesday, March 21, 2012

Feature Selection


1. Matlab using TreeBagger (it is actually like Random Forest)
==========================================

load ionosphere;

noBag = 5;
myBag  = TreeBagger( noBag  ,   X, Y, 'OOBPred','on' , 'oobvarimp' ,'on' );


//  increase in prediction error if the values of that variable are permuted across OOB observations.
//  The more increase in prediction Error ==> The more important the variable is
oobVarImp = myBag.OOBPermutedVarDeltaError


// re-substitution error
varImp = zeros( noBag, noFeature)
for i=1:noBag
   varimportance( myBag.Trees{i})
end

========== COMPLETE CODE==========

function fromRF

load ionosphere;
noBag = 5;
myBag  = TreeBagger( noBag  ,   X, Y, 'OOBPred','on');
varRanking = zeros( noBag  , size(X,2) ) ;

for i=1:noBag   
   [ val ,varRanking( i , :) ]= sort( varimportance( myBag.Trees{i}) ,'descend')   
end

// suppose finally taking top ranked 5 from all folds
topRank=5;
selectedFeat=[]
for i=1:noBag
   selectedFeat = union( selectedFeat , varRanking( i , 1:topRank) ); 
end

display('done');

end


Tuesday, March 20, 2012

JAVA CLASSPATH setting



Source: http://weka.wikispaces.com/CLASSPATH

Win32 (2k and XP)

We assume that the mysql-connector-java-3.1.8-bin.jar archive is located in the following directory:
  • C:\Program Files\Weka-3-4
In the Control Panel click on System (or right click on My Computer and select Properties) and then go to the Advanced tab. There you will find a button called Environment Variables, click it.
Depending on, whether you're the only person using this computer or it is a lab computer shared by many, you can either create a new system-wide (you are the only user) environment variable or a user dependent one (recommended for multi-user machines). Enter the following name for the variable
  • CLASSPATH
and add this value
  • C:\Program Files\Weka-3-4\mysql-connector-java-3.1.8-bin.jar
If you want to add additional jars, you'll have to separate them with the path separator, the semicolon ; (no spaces!).

Unix/Linux

I assume, that the mysql jar is located in the following directory:
  • /home/johndoe/jars/
Open a shell and execute the following command, depending on the shell you're using:
  • bash
    export CLASSPATH=$CLASSPATH:/home/johndoe/jars/mysql-connector-java-3.1.8-bin.jar
  • c shell
    setenv CLASSPATH $CLASSPATH:/home/johndoe/jars/mysql-connector-java-3.1.8-bin.jar

Unix/Linux uses the colon : as path separator, in contrast to Win32, which uses the semicolon ;

Run before adding external jar Unix/Linux

 java -jar weka.jar

Run after adding external jar Unix/Linux

You can not use same command java -jar , when you add external jar. You have to 
use java -classpath 
 
java -classpath $CLASSPATH:weka.jar:libsvm.jar weka.gui.GUIChooser  (linux )
 
java -classpath "%CLASSPATH%;weka.jar;libsvm.jar" weka.gui.GUIChooser (windows) 

Monday, March 19, 2012

java memory allocation memory heap size control -Xmx -Xms

Taken from: http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html

Two JVM options are often used to tune JVM heap size: -Xmx for maximum heap size, and -Xms for initial heap size. Here are some common mistakes I have seen when using them:
  • Missing m, M, g or G at the end (they are case insensitive). For example,
    java -Xmx128 BigApp
    java.lang.OutOfMemoryError: Java heap space
    The correct command should be: java -Xmx128m BigApp. To be precise, -Xmx128 is a valid setting for very small apps, like HelloWorld. But in real life, I guess you really mean -Xmx128m

  • Extra space in JVM options, or incorrectly use =. For example,
    java -Xmx 128m BigApp
    Invalid maximum heap size: -Xmx
    Could not create the Java virtual machine.
    
    java -Xmx=512m HelloWorld
    Invalid maximum heap size: -Xmx=512m
    Could not create the Java virtual machine.
    The correct command should be java -Xmx128m BigApp, with no whitespace nor =. -X options are different than -Dkey=value system properties, where = is used.

  • Only setting -Xms JVM option and its value is greater than the default maximum heap size, which is 64m. The default minimum heap size seems to be 0. For example,
    java -Xms128m BigApp
    Error occurred during initialization of VM
    Incompatible initial and maximum heap sizes specified
    The correct command should be java -Xms128m -Xmx128m BigApp. It's a good idea to set the minimum and maximum heap size to the same value. In any case, don't let the minimum heap size exceed the maximum heap size.

  • Heap size is larger than your computer's physical memory. For example,
    java -Xmx2g BigApp
    Error occurred during initialization of VM
    Could not reserve enough space for object heap
    Could not create the Java virtual machine.
    The fix is to make it lower than the physical memory: java -Xmx1g BigApp

  • Incorrectly use mb as the unit, where m or M should be used instead.
    java -Xms256mb -Xmx256mb BigApp
    Invalid initial heap size: -Xms256mb
    Could not create the Java virtual machine.
  • The heap size is larger than JVM thinks you would ever need. For example,
    java -Xmx256g BigApp
    Invalid maximum heap size: -Xmx256g
    The specified size exceeds the maximum representable size.
    Could not create the Java virtual machine.
    The fix is to lower it to a reasonable value: java -Xmx256m BigApp

  • The value is not expressed in whole number. For example,
    java -Xmx0.9g BigApp
    Invalid maximum heap size: -Xmx0.9g
    Could not create the Java virtual machine.
    The correct command should be java -Xmx928m BigApp
PS:

How to set java heap size in Tomcat?
Stop Tomcat server, set environment variable CATALINA_OPTS, and then restart Tomcat. Look at the file tomcat-install/bin/catalina.sh or catalina.bat for how this variable is used. For example,
set CATALINA_OPTS=-Xms512m -Xmx512m  (Windows, no "" around the value)
export CATALINA_OPTS="-Xms512m -Xmx512m"  (ksh/bash, "" around the value)
setenv CATALINA_OPTS "-Xms512m -Xmx512m"  (tcsh/csh, "" around the value)
In catalina.bat or catallina.sh, you may have noticed CATALINA_OPTS, JAVA_OPTS, or both can be used to specify Tomcat JVM options. What is the difference between CATALINA_OPTS and JAVA_OPTS? The name CATALINA_OPTS is specific for Tomcat servlet container, whereas JAVA_OPTS may be used by other java applications (e.g., JBoss). Since environment variables are shared by all applications, we don't want Tomcat to inadvertently pick up the JVM options intended for other apps. I prefer to use CATALINA_OPTS.

How to set java heap size in JBoss?
Stop JBoss server, edit $JBOSS_HOME/bin/run.conf, and then restart JBoss server. You can change the line with JAVA_OPTS to something like:
JAVA_OPTS="-server -Xms128m -Xmx128m"
How to set java heap size in Eclipse?
You have 2 options:
1. Edit eclipse-home/eclipse.ini to be something like the following and restart Eclipse.
-vmargs
-Xms64m
-Xmx256m
2. Or, you can just run eclipse command with additional options at the very end. Anything after -vmargs will be treated as JVM options and passed directly to the JVM. JVM options specified in the command line this way will always override those in eclipse.ini. For example,
eclipse -vmargs -Xms64m -Xmx256m
How to set java heap size in NetBeans?
Exit NetBeans, edit the file netbeans-install/etc/netbeans.conf. For example,
netbeans_default_options="-J-Xms512m -J-Xmx512m -J-XX:PermSize=32m -J-XX:MaxPermSize=128m -J-Xverify:none
How to set java heap size in Apache Ant?Set environment variable ANT_OPTS. Look at the file $ANT_HOME/bin/ant or %ANT_HOME%\bin\ant.bat, for how this variable is used by Ant runtime.
set ANT_OPTS=-Xms512m -Xmx512m  (Windows)
export ANT_OPTS="-Xms512m -Xmx512m"  (ksh/bash)
setenv ANT_OPTS "-Xms512m -Xmx512m"  (tcsh/csh)
How to set java heap size in jEdit?
jEdit is a java application, and basically you need to set minimum/maximum heap size JVM options when you run java command. jEdit by default runs with a default maximum heap size 64m. When you work on large files, you are likely to get these errors:
java.lang.OutOfMemoryError: Java heap space
at java.lang.String.concat(String.java:2001)
at org.gjt.sp.jedit.buffer.UndoManager.contentInserted(UndoManager.java:160)
at org.gjt.sp.jedit.Buffer.insert(Buffer.java:1139)
at org.gjt.sp.jedit.textarea.JEditTextArea.setSelectedText(JEditTextArea.java:2052)
at org.gjt.sp.jedit.textarea.JEditTextArea.setSelectedText(JEditTextArea.java:2028)
at org.gjt.sp.jedit.Registers.paste(Registers.java:263)

How to fix it? If you click a desktop icon, or Start menu item to start jEdit: right-click the icon or menu item, view its property, and you can see its target is something like:
C:\jdk6\bin\javaw.exe -jar "C:\jedit\jedit.jar"
You can change that line to:
C:\jdk6\bin\javaw.exe -Xmx128m -Xms128m -jar "C:\jedit\jedit.jar"
If you run a script to start jEdit: just add these JVM options to the java line inside the script file:
java -Xmx128m -Xms128m -jar jedit.jar
If you start jEdit by running java command: just add these JVM options to your java command:
java -Xmx128m -Xms128m -jar jedit.jar
Note that when you run java with -jar option, anything after -jar jar-file will be treated as application arguments. So you should always put JVM options before -jar. Otherwise, you will get error:
C:\jedit>java -jar jedit.jar -Xmx128m
Unknown option: -Xmx128m
Usage: jedit [options] [files]
How to set java heap size in JavaEE SDK/J2EE SDK/Glassfish/Sun Java System Application Server?
Stop the application server, edit
$GLASSFISH_HOME/domains/domain1/config/domain.xml, search for XML element name java-config and jvm-options. For example,

-Xmx512m
-XX:NewRatio=2
-XX:MaxPermSize=128m
...
You can also change these settings in the web-based admin console, typically at http://localhost:4848/, or https://localhost:4848/. Go to Application Server near the top of the left panel, and then on the right panel, click JVM Settings | JVM Options, and you will see a list of existing JVM options. You can add new ones and modify existing ones there.

Yet another option is to use its Command Line Interface (CLI) tool command, such as:
./asadmin help create-jvm-options
./asadmin help delete-jvm-options
They may be a bit hard to use manually, but are well suited for automated scripts.

Tuesday, March 13, 2012

Matlab plot graph


To plot
=====
plot( 100*codingCov, 100*noncodingCov,'.');

Change the size of default figure
=================
figure
set(0, 'DefaultFigurePosition', [ leftPos bottomPos width height ]);

To limit the axis value
=================
xlim([0 100]); ylim([0 100]);

Mark or tick each point of axis as you wish
============================
stateName={ 'state1'; state2''; 'state3' ; 'state4';'};
set(gca,'XTickLabel',stateName)

Interactive graph with click show a message
================================
Override or select default  callBack function in mouse event . Message must be cell array

function output_txt = myCallback(obj,event_obj)
% Display the position of the data cursor
% obj          Currently not used (empty)
% event_obj    Handle to event object
% output_txt   Data cursor text string (string or cell array of strings).

fnameStat = '../gene.features/allMotifCNC.stat';
[ covCoding covNonCoding score  consensus ] = textread(fnameStat,'%f\t%f\t%f\t%s');
noMotif = 335;

pos = get(event_obj,'Position');

for i=1:noMotif
   if covCoding(i) == pos(1)  && covNonCoding(i) == pos(2)
     break;
   end
end

output_txt=  consensus(i);

if length(pos) > 2
    output_txt{end+1} = ['Z: ',num2str(pos(3),4)];
end


Sunday, March 11, 2012

3 steps for p-value ( p value ) analysis


 p-value:
=======
Probability (or the area) at the tail of a bell shaped curve, where,

    center of bell = population mean,
    marker = sample mean
    p-value = area remaining at the tail after deducting the are
 
intuition:
=========
the lower the area ==> the higher the distance between centre and sample mean . So, we can reject hypothesis.

Steps to calculate p-value


1. Assume null hypethesis ( i.e.  Population mean)
====================================
It( actually the opposite of this ) will be the Null Hypothesis

2. Calculate sample statistics
=====================
Now, calculate statistics from available data.

3. Calculate p-value
==================
If p-value is small ==> distance between population mean and sample mean is high ==> Reject Null Hypo

If p-value is big==> distance between population mean and sample mean is small ==>Fail to Reject Null Hypo



         Fig : p-value (source: wiki )


As, we can not accept a null hypothesis() , we try to reject null hypo. So, it is wise to use the hypo opposite to what we are trying to establish as Null Hypo.

Friday, March 2, 2012

bioinformatics algorithm


1. Global alignment
===================
Needleman–Wunsch algorithm

2. Local alignment
===================
Smith-Waterman algorithm