My Blog

Posts

Showing posts from March, 2012

Feature Selection

1. Matlab using TreeBagger (it is actually like Random Forest) ========================================== load ionosphere; noBag = 5; myBag = TreeBagger( noBag , X, Y, 'OOBPred','on' , 'oobvarimp' ,'on' ); // increase in prediction error if the values of that variable are permuted across OOB observations. // The more increase in prediction Error ==> The more important the variable is oobVarImp = myBag.OOBPermutedVarDeltaError // re-substitution error varImp = zeros( noBag, noFeature) for i=1:noBag varimportance( myBag.Trees{i}) end ========== COMPLETE CODE========== function fromRF load ionosphere; noBag = 5; myBag = TreeBagger( noBag , X, Y, 'OOBPred','on'); varRanking = zeros( noBag , size(X,2) ) ; for i=1:noBag [ val ,varRanking( i , :) ]= sort( varimportance( myBag.Trees{i}) ,'descend') end // suppose finally taking top rank...

JAVA CLASSPATH setting

Source: http://weka.wikispaces.com/CLASSPATH Win32 (2k and XP) We assume that the mysql-connector-java-3.1.8-bin.jar archive is located in the following directory: C:\Program Files\Weka-3-4 In the Control Panel click on System (or right click on My Computer and select Properties ) and then go to the Advanced tab. There you will find a button called Environment Variables , click it. Depending on, whether you're the only person using this computer or it is a lab computer shared by many, you can either create a new system-wide (you are the only user) environment variable or a user dependent one (recommended for multi-user machines). Enter the following name for the variable CLASSPATH and add this value C:\Program Files\Weka-3-4\mysql-connector-java-3.1.8-bin.jar If you want to add additional jars, you'll have to separate them with the path separator, the semicolon ; (no spaces!). Unix/Linux I assume, that the mysql jar is located in ...

java memory allocation memory heap size control -Xmx -Xms

Taken from: http://javahowto.blogspot.com/2006/06/6-common-errors-in-setting-java-heap.html Two JVM options are often used to tune JVM heap size: -Xmx for maximum heap size, and -Xms for initial heap size. Here are some common mistakes I have seen when using them: Missing m, M, g or G at the end (they are case insensitive). For example, java -Xmx128 BigApp java.lang.OutOfMemoryError: Java heap space The correct command should be: java -Xmx128m BigApp . To be precise, -Xmx128 is a valid setting for very small apps, like HelloWorld. But in real life, I guess you really mean -Xmx128m Extra space in JVM options, or incorrectly use =. For example, java -Xmx 128m BigApp Invalid maximum heap size: -Xmx Could not create the Java virtual machine. java -Xmx=512m HelloWorld Invalid maximum heap size: -Xmx=512m Could not create the Java virtual machine. The correct command should be java -Xmx128m BigApp , with no whitespace nor =. -X options are different than -Dkey...

Matlab plot graph

To plot ===== plot( 100*codingCov, 100*noncodingCov,'.'); Change the size of default figure ================= figure set(0, 'DefaultFigurePosition', [ leftPos bottomPos width height ]); To limit the axis value ================= xlim([0 100]); ylim([0 100]); Mark or tick each point of axis as you wish ============================ stateName={ 'state1'; state2''; 'state3' ; 'state4';'}; set(gca,'XTickLabel',stateName) Interactive graph with click show a message ================================ Override or select default callBack function in mouse event . Message must be cell array function output_txt = myCallback(obj,event_obj) % Display the position of the data cursor % obj Currently not used (empty) % event_obj Handle to event object % output_txt Data cursor text string (string or cell array of strings). fnameStat = '../gene.features/allMotifCNC.stat'; [ covCoding covNonCodin...

3 steps for p-value ( p value ) analysis

p-value: ======= Probability (or the area) at the tail of a bell shaped curve, where, center of bell = population mean, marker = sample mean p-value = area remaining at the tail after deducting the are intuition : ========= the lower the area ==> the higher the distance between centre and sample mean . So, we can reject hypothesis. Steps to calculate p-value 1. Assume null hypethesis ( i.e. Population mean) ==================================== It( actually the opposite of this ) will be the Null Hypothesis 2. Calculate sample statistics ===================== Now, calculate statistics from available data. 3. Calculate p-value ================== If p-value is small ==> distance between population mean and sample mean is high ==> Reject Null Hypo If p-value is big==> distance between population mean and sample mean is small ==>Fail to Reject Null Hypo ...

bioinformatics algorithm

1. Global alignment =================== Needleman–Wunsch algorithm 2. Local alignment =================== Smith-Waterman algorithm