LINUX shell script tutorial

1. Text Processing

Cut: only selected column: 1 BASED index

// check the unique value on 7 th column

cut -f 7 amel_OGSv3.2.gff3 | sort -u | wc -l

default is tab delimited

cut -f 1,3 fName

if other is used as delimiter

cut -f 1,3 -d ':' fName

GREP

copy line containing "gene*"

// to find exact word use -w
// it will find as a substirng/word , But if you want exact word use -w
grep -iw "gene*" amel_OGSv3.2.gff3 > ./amel_OGSv3.2.gene.gff3

grep -w "[-]9" fname // find word -9 in file.

grep multiple words in file

grep 'good\|bad' test.txt

The following command line will grep from 1 lines [before] the match through 1000 lines [after] the match.

  grep "^AC P0001" factor.table -B1 -A1000 > out.txt

“^AC P0001” is a regular expression. The carrot (^) means the start of a line. So, the quoted text means to find the line that starts with the dealer name AC P0001. The -B1 gives us 1 line before the match. The -A1000 gives us 1000 lines after the match.

SORT , 1 BASED INDEX

sort in multiple column
sort -k1,1 -k2,2n input.txt

if you want to sort descending add r (by default it is ascending)
sort -k1,1r -k2,2nr input.txt

copy lines from any position
sed -n '10,20p'

AWK , 1 based index (By default AWK use whitespace as delimiter, if you want to change use FS=YourDelimiter )

For CSV file,

( if second column is 420 and third column is 1 in a CSV file, then select those lines )
awk '$2==420 && $3==1' FS=, input.csv > output.csv

awk '$2 > 0' input.txt > output.txt

// if field 3 contains exon, then output field 20
awk '{ if($3=="exon" ){print $20 }}' input.txt | sort -u > output.txt

// if field 3 contains exon, then output all fields(so no need mention anything after print )
awk '{ if($3=="exon" ){print }}' input.txt > output.txt

// passing variable in awk [ must surround variable with quote ]
// if the field 5 greater than variable, print lines
awk '{ if($5 ≥ '"$variable"' ){print }}' input.txt > output.txt

Loop in shell

# Loop in array using index

declare -a myarray=("GM12878" "K562" "H9ES")
for i in `seq 0 3`
do
    echo $i
    echo ${myarray[$i]}
done

# Loop over all files inside a number

totCount=0
allValues=$(ls $dirOntologizerInputTarget/*.input)
declare -a arrInput=$allValues # ("E003" "E008" )
for filename in ${arrInput[@]}
do
    namePrefix=$(basename $filename ".input")
    totCount=$(($totCount+1))
    echo $namePrefix
done
echo $totCount

# From a range of number

i=1;
echo $i;

max=999999
for i in `seq 2 $max`
do
    echo "$i"
done

# From a list
declare -a myarr=("element1" "element2" "element3")
## now loop through the above array
for i in "${myarr[@]}"
do
   echo $i
done

# From an List

allStim[0]="mouse_macrophage_TB_infection_IFNg.counts.csv"
allStim[1]="mouse_macrophage_TB_infection_IL4.counts.csv"
allStim[2]="mouse_macrophage_TB_infection_IL13.counts.csv"
allStim[3]="mouse_macrophage_TB_infection_IL4-IL13.counts.csv"

# for test in $allStim
for i in `seq 0 3`
do
    # echo $i;
    echo ${allStim[$i]}

done

# From an array

declare -a arr=("GM12878" "K562" "H9ES")
for i in ${arr[@]}
do
   # echo $i
   cellLine=$i

done

Break

    if [ "$totCount" -eq 3 ] ; then
        # $arrInput=""
        break;
    fi

If else in shell

Example 1
checkOntotlogizer=false

if [ $checkOntotlogizer == true ] ; then

# do your work
fi

Example2

# Prompt for a user name...
echo "Please enter your age:"
read AGE

if [ "$AGE" -lt 20 ] || [ "$AGE" -ge 50 ]; then
 echo "Sorry, you are out of the age range."
elif [ "$AGE" -ge 20 ] && [ "$AGE" -lt 30 ]; then
 echo "You are in your 20s"
elif [ "$AGE" -ge 30 ] && [ "$AGE" -lt 40 ]; then
 echo "You are in your 30s"
elif [ "$AGE" -ge 40 ] && [ "$AGE" -lt 50 ]; then
 echo "You are in your 40s"
fi

Rename all files with extension to another extension

dirCheck="/home/alamt/F5shortRNA/result_v3/resultOntologizer/functionBasedOnTarget/backgroundHuman/"
extTSV=".tsv"
newExt=".xls"

allValues=$(ls $dirCheck/*$extTSV)
declare -a arrInput=$allValues # ("E003" "E008" )

totCount=0
for filename in ${arrInput[@]}
do
    namePrefix=$(basename $filename $extTSV)
    totCount=$(($totCount+1))
    mv $dirCheck/$namePrefix$extTSV $dirCheck/$namePrefix$newExt
    #if [ "$totCount" -ge 2 ] ; then
    #    break;
    #fi
done

2. Soft Link(use soft link for big file if we do not want to copy it in multiple place

ln -s basic.file softlink.file

3. vi editor

http://www.guckes.net/vi/substitute.html

a. Replace all occurrence of word by another

: %s/old/new/g

Remove "> "space by i.e "> ab" to "ab"
: %s/> //g

ALWAYS USE FIRST BRACKET () BY  to enclose pattern

%s/_$.$*/_HUMAN/g ==> i.e abc_EXT --> abc_HUMAN

FIND : /Search exact / pattern FILE

locate : simplest only provide location, less powerful
http://www.codecoffee.com/tipsforlinux/articles/20.html

find: robust lot of options ,more powerful

http://www.codecoffee.com/tipsforlinux/articles/21.html

find /home/alamt/ -name file.txt

find /home/alamt/ -name file*.txt

Shell multi line comment using vi editor

Comment from line 50 to 100

:50,100s/^/#

Uncomment from line 50 to 100

:50,100s/^#/

4. WGET to download files recursively

cd /projects/dragon/FANTOM5/
wget -r --user myid --password mypass --force-html -i https://fantom5-collaboration.gsc.riken.jp/webdav/lncRNAome_draft/Human/data/catalog/

5. Background and foreground process management

NOHUP: Linux process management in foreground background and background using nohup

http://stackoverflow.com/questions/9190151/how-to-run-a-shell-script-in-the-backgroung-and-get-no-output

http://linuxg.net/how-to-manage-background-and-foreground-processes/

http://unix.stackexchange.com/questions/45025/how-to-suspend-and-bring-a-background-process-to-foreground

How to run a job in background
======================
nohup /path/to/your/script.sh > /dev/null 2>&1 &

How to view background process
======================
jobs

How to move job from foreground to background and vice versa
=======================================

fg %jobid

bg %jobid

How to kill background jobs
======================
kill -19 %job_id

RSYNC ( copy from loacal/remote to local/remote with synchronization)

rsync -avH --progress alamt@10.70.58.115:/home/alamt/CNC /destfolder/

Screen

http://news.softpedia.com/news/GNU-Screen-Tutorial-44274.shtml

Keep in mind that screen's default command character is Ctrl+a (press the Ctrl key, hold it and press a, then release them both =====> Ctrl-a).
Moreover, the command (letter) entered after Ctrl+a is case sensitive, so for example Ctrl+a n is a different command from Ctrl+a N.

1. open
===============
screen
OR screen -S screenname

2. list of screen
===============
screen -ls

3. Enter into a screen
================

screen -r screenID

4. De-attach screen
==============
ctrl-a +d

5. Kill permanenelty
==============
ctrl-a + K

OR

screen -X -S screenname kill (tested killing)

6. Create new terminal/shell under a screen
============================
if you work is some software, you have to create different terminal in a screen.

// to create new command terminal
ctrl-a c

// shift among terminals. 10 terminals can be made ( 0 - 9)

ctrl-a SHIFT ( 0 ==> 1)
ctrl-a SHIFT ( 1 ==> 2)
ctrl-a SHIFT ( 2 ==> 3)

ctrl-a SHIFT ( 8 ==> 9)
ctrl-a SHIFT ( 9 ==> 0)

// Detach from terminal and scree

go to a terminal where no software is running. it is on basic terminal.

Then ctrl-a d to deattach from screen.

6. System related commands: Linux Processor and memory information

To get the number of processor
====================

less /proc/cpuinfo | grep processor

To get the details of each processor and number of core
=======================
# very detailed information
less /proc/cpuinfo | less

# summary information how many core/processor etc

lscpu

To get the RAM size
=============

1. In human readable format

free -g
free -m

2. In more details

less /proc/meminfo

To get the disk size
=============

1. In human readable format

df -h

Total size of a folder
===============
Enter into folder and then
du -ch | grep total

My Blog

Search This Blog