Skip to main content

LINUX shell script tutorial

 

1. Text Processing 

  • Cut:  only selected column: 1 BASED index


// check the unique value on 7 th column

cut -f 7  amel_OGSv3.2.gff3 | sort -u | wc -l

default is tab delimited

cut   -f 1,3  fName

if other is used as delimiter

cut   -f 1,3 -d ':'  fName

  • GREP

copy line containing "gene*"

// to find exact word use -w
// it will find as a substirng/word , But if you want exact word use -w
grep -iw "gene*" amel_OGSv3.2.gff3 > ./amel_OGSv3.2.gene.gff3

grep -w "[-]9" fname // find word -9 in file.

grep multiple words in file

 grep 'good\|bad' test.txt

The following command line will grep from lines [before] the match through 1000 lines [after] the match.

  grep "^AC P0001" factor.table -B1 -A1000 > out.txt

“^AC P0001” is a regular expression. The carrot (^) means the start of a line. So, the quoted text means to find the line that starts with the dealer name AC P0001. The -B1 gives us 1 line before the match. The -A1000 gives us 1000 lines after the match.



  • SORT , 1 BASED INDEX


sort in multiple column
sort -k1,1 -k2,2n   input.txt 

if you want to sort descending add r (by default it is ascending)
sort -k1,1r  -k2,2nr   input.txt

  • SED


copy lines from any position
sed -n '10,20p'

  • AWK , 1 based index (By default AWK use whitespace as delimiter, if you want to change use FS=YourDelimiter )


For CSV file,

( if second column is 420 and third column is 1 in a CSV file, then select those lines )
awk '$2==420 && $3==1'  FS=, input.csv  > output.csv

 awk '$2 > 0' input.txt > output.txt

// if field 3 contains exon, then output field 20
awk '{ if($3=="exon" ){print $20 }}' input.txt | sort -u  > output.txt

// if field 3 contains exon, then output all fields(so no need mention anything after print )
awk '{ if($3=="exon" ){print  }}' input.txt   > output.txt

// passing variable in awk [ must surround variable with quote ]
// if the field 5 greater than variable, print lines
awk '{ if($5 ≥ '"$variable"' ){print  }}'  input.txt  >   output.txt
  •   Loop in shell


# Loop in array using index

declare -a myarray=("GM12878" "K562" "H9ES")
for i in `seq 0 3`
do
    echo $i
    echo ${myarray[$i]}
done


# Loop over all files inside a  number

totCount=0
allValues=$(ls  $dirOntologizerInputTarget/*.input)
declare -a arrInput=$allValues  # ("E003"  "E008" )
for filename in ${arrInput[@]}
do
    namePrefix=$(basename $filename ".input")
    totCount=$(($totCount+1))
    echo $namePrefix
done
echo $totCount

# From a range of number

i=1;
echo $i;

max=999999
for i in `seq 2 $max`
do
    echo "$i"
done

#  From a list
declare -a myarr=("element1" "element2" "element3")
## now loop through the above array
for i in "${myarr[@]}"
do
   echo $i
done

# From an List

allStim[0]="mouse_macrophage_TB_infection_IFNg.counts.csv"
allStim[1]="mouse_macrophage_TB_infection_IL4.counts.csv"
allStim[2]="mouse_macrophage_TB_infection_IL13.counts.csv"
allStim[3]="mouse_macrophage_TB_infection_IL4-IL13.counts.csv"

# for test in  $allStim
for i in `seq 0 3`
do      
    # echo $i;
    echo ${allStim[$i]}
  
done


# From an array

declare -a arr=("GM12878" "K562" "H9ES")
for i in ${arr[@]}
do
   # echo $i
   cellLine=$i
     
done


  •   Break


    if [ "$totCount" -eq 3 ] ; then
        # $arrInput=""
        break;
    fi
  •   If else  in shell

Example 1
checkOntotlogizer=false

if [ $checkOntotlogizer == true ] ; then

     # do your work
fi

Example2

# Prompt for a user name...
echo "Please enter your age:"
read AGE

if [ "$AGE" -lt 20 ] || [ "$AGE" -ge 50 ]; then
 echo "Sorry, you are out of the age range."
elif [ "$AGE" -ge 20 ] && [ "$AGE" -lt 30 ]; then
 echo "You are in your 20s"
elif [ "$AGE" -ge 30 ] && [ "$AGE" -lt 40 ]; then
 echo "You are in your 30s"
elif [ "$AGE" -ge 40 ] && [ "$AGE" -lt 50 ]; then
 echo "You are in your 40s"
fi


Rename all files with extension to another extension


dirCheck="/home/alamt/F5shortRNA/result_v3/resultOntologizer/functionBasedOnTarget/backgroundHuman/"
extTSV=".tsv"
newExt=".xls"

allValues=$(ls  $dirCheck/*$extTSV)
declare -a arrInput=$allValues  # ("E003"  "E008" )

totCount=0
for filename in ${arrInput[@]}
do
    namePrefix=$(basename $filename $extTSV)       
    totCount=$(($totCount+1))
    mv $dirCheck/$namePrefix$extTSV $dirCheck/$namePrefix$newExt
    #if [ "$totCount" -ge 2 ] ; then
    #    break;
    #fi   
done   

2. Soft Link(use soft link for big file if we do not want to copy it in multiple place 


ln -s basic.file softlink.file

3. vi editor


http://www.guckes.net/vi/substitute.html

a. Replace all occurrence of word by another

: %s/old/new/g

Remove "> "space by   i.e "> ab" to "ab"
: %s/> //g

ALWAYS USE FIRST BRACKET () BY \( \) to enclose pattern

%s/_\(.\)*/_HUMAN/g    ==> i.e abc_EXT --> abc_HUMAN


FIND : /Search  exact / pattern  FILE


locate : simplest only provide location, less powerful
http://www.codecoffee.com/tipsforlinux/articles/20.html

find: robust lot of options ,more powerful

http://www.codecoffee.com/tipsforlinux/articles/21.html


find /home/alamt/  -name file.txt

find /home/alamt/  -name file*.txt


Shell multi line comment using vi editor

Comment from line 50 to 100

:50,100s/^/#


Uncomment from line 50 to 100

:50,100s/^#/

4. WGET to download files recursively



cd /projects/dragon/FANTOM5/
wget -r --user myid  --password mypass --force-html -i https://fantom5-collaboration.gsc.riken.jp/webdav/lncRNAome_draft/Human/data/catalog/



5. Background and foreground process management

NOHUP:  Linux process management in foreground background and background using nohup


http://stackoverflow.com/questions/9190151/how-to-run-a-shell-script-in-the-backgroung-and-get-no-output

http://linuxg.net/how-to-manage-background-and-foreground-processes/

http://unix.stackexchange.com/questions/45025/how-to-suspend-and-bring-a-background-process-to-foreground


How to run a job in background
======================
nohup /path/to/your/script.sh > /dev/null 2>&1 &


How to view background process
======================
jobs


How to move job from foreground to background and vice versa
=======================================

fg %jobid

bg %jobid

How to kill background jobs
======================
kill -19 %job_id


RSYNC ( copy from loacal/remote to local/remote with synchronization)


rsync -avH --progress  alamt@10.70.58.115:/home/alamt/CNC     /destfolder/

Screen


http://news.softpedia.com/news/GNU-Screen-Tutorial-44274.shtml

  • Keep in mind that screen's default command character is Ctrl+a (press the Ctrl key, hold it and press a, then release them both  =====> Ctrl-a). 
  • Moreover, the command (letter) entered after Ctrl+a is case sensitive, so for example Ctrl+a n is a different command from Ctrl+a N.

1. open
===============
screen
OR screen -S screenname

2. list of screen
===============
screen -ls

3. Enter into a screen
================

screen -r screenID

4. De-attach screen
==============
ctrl-a +d

5. Kill permanenelty
==============
ctrl-a + K 

OR

screen -X -S screenname kill  (tested killing)

6. Create new terminal/shell under a screen
============================
if you work is some software, you have to create different terminal in a screen.

// to create new command terminal
ctrl-a c

// shift among terminals. 10 terminals can be made ( 0 - 9)

ctrl-a  SHIFT ( 0 ==> 1)
ctrl-a  SHIFT ( 1 ==> 2)
ctrl-a  SHIFT ( 2 ==> 3)

ctrl-a  SHIFT ( 8 ==> 9)
ctrl-a  SHIFT ( 9 ==> 0)


// Detach from terminal and scree

go to a terminal where no software is running. it is on basic terminal.

Then ctrl-a d   to deattach from screen.

6. System related commands: Linux Processor and memory information


To get the number of processor
====================

less /proc/cpuinfo | grep processor

To get the details of each processor and number of core
=======================
# very detailed information
less /proc/cpuinfo | less

# summary information how many core/processor etc

lscpu

To get the RAM size
=============

1. In human readable format

free -g
free -m

2. In more details

less /proc/meminfo 


To get the disk size
=============

1. In human readable format

df -h

Total size of a folder
===============
Enter into folder and then
du -ch | grep total



Comments

Popular posts from this blog

MATLAB cross validation

// use built-in function samplesize = size( matrix , 1); c = cvpartition(samplesize,  'kfold' , k); % return the indexes on each fold ///// output in matlab console K-fold cross validation partition              N: 10    NumTestSets: 4      TrainSize: 8  7  7  8       TestSize: 2  3  3  2 ////////////////////// for i=1 : k    trainIdxs = find(training(c,i) ); %training(c,i);  // 1 means in train , 0 means in test    testInxs  = find(test(c,i)       ); % test(c,i);       // 1 means in test , 0 means in train    trainMatrix = matrix (  matrix(trainIdxs ), : );    testMatrix  = matrix (  matrix(testIdxs  ), : ); end //// now calculate performance %%  calculate performance of a partition     selectedKfoldSen=[];selectedKfoldSpe=[];selectedKfoldAcc=[];     indexSen=1;indexSpe=1;indexAcc=1;     if ( kfold == (P+N) )% leave one out         sensitivity = sum(cvtp) /( sum(cvtp) + sum(cvfn) )         specificity = sum(cvtn) /( sum(cvfp) + sum(cvtn) )         acc

R tutorial

Install R in linux ============ In CRAN home page, the latest version is not available. So, in fedora, Open the terminal yum list R  --> To check the latest available version of r yum install R --> install R version yum update R --> update current version to latest one 0 find help ============ ?exact topic name (  i.e.   ?mean ) 0.0 INSTALL 3rd party package  ==================== install.packages('mvtnorm' , dependencies = TRUE , lib='/home/alamt/myRlibrary/')   #  install new package BED file parsing (Always use read.delim it is the best) library(MASS) #library(ggplot2) dirRoot="D:/research/F5shortRNA/TestRIKEN/Rscripts/" dirData="D:/research/F5shortRNA/TestRIKEN/" setwd(dirRoot) getwd() myBed="test.bed" fnmBed=paste(dirData, myBed, sep="") # ccdsHh19.bed   tmp.bed ## Read bed use read.delim - it is the  best mybed=read.delim(fnmBed, header = FALSE, sep = "\t", quote = &q