My Blog

Posts

Showing posts from October, 2011

matlab normalization

Using meand stddev function [featureIn,meanFeatIn, stdDevFeatIn] = mynorm_train(featureIn) meanFeatIn = mean(featureIn,1); stdDevFeatIn = std(featureIn,1,1); noSample = size(featureIn,1); for i=1:noSample featureIn(i,:) = (featureIn(i,:) - meanFeatIn) ./ stdDevFeatIn ; end end function [testFeatureIn] = mynorm_test(testFeatureIn,meanFeatIn,stdDevFeatIn) noSample = size(testFeatureIn,1); noInputFeat = size(testFeatureIn,2); for i=1:noSample testFeatureIn(i,1:noInputFeat) = (testFeatureIn(i,1:noInputFeat) - meanFeatIn ) ./ stdDevFeatIn; end end Using range function [ N_feature,feature_range,feature_bases ] = normalize( features ) %NORMALIZE Summary of this function goes here % Detailed explanation goes here % samples are in rows for NoF = 1:size(features,2) F_min(NoF) = min(features(:,NoF)); F_max(NoF) = max(features(:,NoF)); feature_range(NoF) = (F_max(NoF)-F_min(NoF))/2; feature_ba

matlab feature ranking

used function rankfeatures (consider sample as column) ==================================================== train = [trainFeature trainLabel]; [IDX ,Z] = rankfeatures( trainFeature' , trainLabel' ,' Criterion ', 'ttest' ); % ttest / entropy/ etc... topRankedFeature = (size( trainLabel ,1)) / 2 ; classify( testFeature ( :,IDX(1:topRankedFeature) ) , ... trainFeature( :,IDX(1:topRankedFeature) ) , trainLabel , ... ' diagquadratic' ) % liner/quadratic/ diagquadratic etc % transpose as it takes sample as column vector % ttest / entropy/ etc... %IDX is the list of indices to the rows in X with the most significant features. %Z is the absolute value of the criterion used (see below)

MATLAB discriminant analysis

function used classify =================== predictedClass = classify(testSample,trainSample,trainGroup, diagquadratic ) % liner/quadratic/ etc

MATLAB check unique string in file

function identifyDuplicate clc; uniqueSeq={}; dupSeq={}; index=1; uniqueIndex=1; dupIndex=1; uniq=[]; dup=[]; isDuplicated = 0; fid = fopen('1400M_from_287PS_287NS.ranked','r'); tline = fgetl(fid); % ****** while ischar(tline) consensusSeq = fgetl(fid); % Consessus: AAACC consensusSeq = upper(consensusSeq); curSeq = sscanf(consensusSeq,'%*s %s', [1, inf]); curSeq = upper(curSeq); fgetl(fid); % Threshold fgetl(fid); % Coverage fgetl(fid); % p-value fgetl(fid); % r1 fgetl(fid); % r2 fgetl(fid); % r3 fgetl(fid); % r4 isExist=0; for en=1:uniqueIndex -1 exist = strcmp(curSeq,uniqueSeq{en}) if exist ==1 isDuplicated = 1; break; end end if( isDuplicated == 1 ) % already exist dupSeq{dupIndex} = {curSeq}; dupIndex = dupIndex + 1; dup = [dup;index]; else % not found

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize, 'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition N: 10 NumTestSets: 4 TrainSize: 8 7 7 8 TestSize: 2 3 3 2 ////////////////////// for i=1 : k trainIdxs = find(training(c,i) ); %training(c,i); // 1 means in train , 0 means in test testInxs = find(test(c,i) ); % test(c,i); // 1 means in test , 0 means in train trainMatrix = matrix ( matrix(trainIdxs ), : ); testMatrix = matrix ( matrix(testIdxs ), : ); end //// now calculate performance %% calculate performance of a partition selectedKfoldSen=[];selectedKfoldSpe=[];selectedKfoldAcc=[]; indexSen=1;indexSpe=1;indexAcc=1; if ( kfold == (P+N) )% leave one out sensitivity = sum(cvtp) /( sum(cvtp) + sum(cvfn) ) specificity = sum(cvtn) /( sum(cvfp) + sum(cvtn) ) acc

MATLAB distance based learning

kNN ========= all= [ 1 2 ; 3 4 ; 5 6 ; 7 8; 9 10]; newpoint = [ 1 7]; [indexes,distances] = knnsearch(all , newpoint,'k', 3) % 3 nearest neighbour

MATLAB confusion matrix

% test_class & predicted_class must be same dimension % 'order' - describes the order of label. Here labels are 'g' as positive and 'h' as negative [C,order] = confusionmat( test_class(1: noSampleTest), predicted_class, 'order', ['g' ;'h'] ) tp = C(1,1); fn = C(1,2); fp = C(2,1); tn = C(2,2); sensitivity = tp /( tp + fn ) specificity = tn /( fp + tn ) accuracy = (tp+tn) / (tp+fn+fp+tn) tpr = sensitivity fpr = 1-specificity precision = tp /( tp + fp ) fVal = (2*tpr*precision)/(tpr+precision)

MATLAB string manipulation

string cell array ================ ar = {'aa';'bbbbb'} mat2str (convert number to string) ====================== parameter = '-s 0 -t 0 -c '; for c=1:2 nextp = [parameter mat2str( c) ] end string comparison ================ sa='ab'; sb='aba'; t = strcmp(sa , sb ) if( t == 1) display('Yest match'); else display('No match') end strmatch find exact string in cell array ======================= list = {'max', 'minimax', 'maximum', 'max'} x = strmatch('max',list,'exact') find string starts with in cell array ======================= list = {'max', 'minimax', 'maximum', 'max'} x = strmatch('max',list) String Comparison ================ strmatch('ab' , 'abc') ; % return 1 strcmp('ab' , 'abc') ; % return0 Find index of

MATLAB normalize train and test

matrixTrain = load('matrix.train'); function [matrixTrain , meanFeatIn, stdDevFeatIn] = mynorm_train(matrixTrain) featureIn = matrixTrain(:,1:end-1); featureOut = matrixTrain(:,end); meanFeatIn = mean(featureIn,1); stdDevFeatIn = std(featureIn,1,1); meanFeatOut = mean(featureOut,1); stdDevFeatOut = std(featureOut,1,1) ; dlmwrite('normInfo',[meanFeatOut stdDevFeatOut],'delimiter','\t'); noSample = size(featureIn,1); for i=1:noSample featureIn(i,:) = (featureIn(i,:) - meanFeatIn) ./ stdDevFeatIn ; featureOut(i,:) = (featureOut(i,:) - meanFeatOut) ./ stdDevFeatOut ; end matrixTrain = [ featureIn featureOut]; end matrixTest = load('matrix.test'); function [matrixTest] = mynorm_train(matrixTest,meanFeatIn, stdDevFeatIn,meanFeatOut ,stdDevFeatOut ) noSample = size(matrixTest,1); noInputFeat = size(matrixTest,2) - 1; for i=1:noSample matrixTest(i,1:noInputFeat) = (matrixTest(i,1:noI

matlab matrix to weka .arff format conversion

inputFormat ( matrix are tab seperated , last column indicates the label, This is for two class problem, for multiclass need to change the code in few lines) ====================================================================== 5.0 6.5 7.9 +1 6.6 8.9 6.1 -1 code ======= function matlabToarff % convert matrix int arff(Attribute relation file format )format clc; fNameData = 'seqLabel'; fNameARFF = 'seqLabel.arff'; fidARFF = fopen( fNameARFF ,'w'); matrix = load(fNameData); feature = matrix ( : , 1:end-1); label = matrix (: , end) ; noFeature = size(feature,2); noSample = size(feature,1); %%%%%%%%%% header fprintf(fidARFF,'%s\n\n','@RELATION LNCRNAsequence'); for i=1:noFeature % noFeature fprintf(fidARFF,'%s\t%d\t%s\n' ,'@ATTRIBUTE' , i, 'NUMERIC' ); end fprintf(fidARFF,'%s\n\n','@ATTRIBUTE class {+1,-1 }'); %%%%%%%%%% data fprintf(fidARFF,'%s\n','@DA