Thursday, May 17, 2012

LINUX shell script tutorial


1. Text Processing 

  • Cut:  only selected column: 1 BASED index

// check the unique value on 7 th column

cut -f 7  amel_OGSv3.2.gff3 | sort -u | wc -l

default is tab delimited

cut   -f 1,3  fName

if other is used as delimiter

cut   -f 1,3 -d ':'  fName

  • GREP

copy line containing "gene*"

// to find exact word use -w
// it will find as a substirng/word , But if you want exact word use -w
grep -iw "gene*" amel_OGSv3.2.gff3 > ./amel_OGSv3.2.gene.gff3

grep -w "[-]9" fname // find word -9 in file.

grep multiple words in file

 grep 'good\|bad' test.txt

The following command line will grep from lines [before] the match through 1000 lines [after] the match.

  grep "^AC P0001" factor.table -B1 -A1000 > out.txt

“^AC P0001” is a regular expression. The carrot (^) means the start of a line. So, the quoted text means to find the line that starts with the dealer name AC P0001. The -B1 gives us 1 line before the match. The -A1000 gives us 1000 lines after the match.


sort in multiple column
sort -k1,1 -k2,2n   input.txt 

if you want to sort descending add r (by default it is ascending)
sort -k1,1r  -k2,2nr   input.txt

  • SED

copy lines from any position
sed -n '10,20p'

  • AWK , 1 based index (By default AWK use whitespace as delimiter, if you want to change use FS=YourDelimiter )

For CSV file,

( if second column is 420 and third column is 1 in a CSV file, then select those lines )
awk '$2==420 && $3==1'  FS=, input.csv  > output.csv

 awk '$2 > 0' input.txt > output.txt

// if field 3 contains exon, then output field 20
awk '{ if($3=="exon" ){print $20 }}' input.txt | sort -u  > output.txt

// if field 3 contains exon, then output all fields(so no need mention anything after print )
awk '{ if($3=="exon" ){print  }}' input.txt   > output.txt

// passing variable in awk [ must surround variable with quote ]
// if the field 5 greater than variable, print lines
awk '{ if($5 ≥ '"$variable"' ){print  }}'  input.txt  >   output.txt
  •   Loop in shell

# Loop in array using index

declare -a myarray=("GM12878" "K562" "H9ES")
for i in `seq 0 3`
    echo $i
    echo ${myarray[$i]}

# Loop over all files inside a  number

allValues=$(ls  $dirOntologizerInputTarget/*.input)
declare -a arrInput=$allValues  # ("E003"  "E008" )
for filename in ${arrInput[@]}
    namePrefix=$(basename $filename ".input")
    echo $namePrefix
echo $totCount

# From a range of number

echo $i;

for i in `seq 2 $max`
    echo "$i"

#  From a list
declare -a myarr=("element1" "element2" "element3")
## now loop through the above array
for i in "${myarr[@]}"
   echo $i

# From an List


# for test in  $allStim
for i in `seq 0 3`
    # echo $i;
    echo ${allStim[$i]}

# From an array

declare -a arr=("GM12878" "K562" "H9ES")
for i in ${arr[@]}
   # echo $i

  •   Break

    if [ "$totCount" -eq 3 ] ; then
        # $arrInput=""
  •   If else  in shell

Example 1

if [ $checkOntotlogizer == true ] ; then

     # do your work


# Prompt for a user name...
echo "Please enter your age:"
read AGE

if [ "$AGE" -lt 20 ] || [ "$AGE" -ge 50 ]; then
 echo "Sorry, you are out of the age range."
elif [ "$AGE" -ge 20 ] && [ "$AGE" -lt 30 ]; then
 echo "You are in your 20s"
elif [ "$AGE" -ge 30 ] && [ "$AGE" -lt 40 ]; then
 echo "You are in your 30s"
elif [ "$AGE" -ge 40 ] && [ "$AGE" -lt 50 ]; then
 echo "You are in your 40s"

Rename all files with extension to another extension


allValues=$(ls  $dirCheck/*$extTSV)
declare -a arrInput=$allValues  # ("E003"  "E008" )

for filename in ${arrInput[@]}
    namePrefix=$(basename $filename $extTSV)       
    mv $dirCheck/$namePrefix$extTSV $dirCheck/$namePrefix$newExt
    #if [ "$totCount" -ge 2 ] ; then
    #    break;

2. Soft Link(use soft link for big file if we do not want to copy it in multiple place 

ln -s basic.file softlink.file

3. vi editor

a. Replace all occurrence of word by another

: %s/old/new/g

Remove "> "space by   i.e "> ab" to "ab"
: %s/> //g

ALWAYS USE FIRST BRACKET () BY \( \) to enclose pattern

%s/_\(.\)*/_HUMAN/g    ==> i.e abc_EXT --> abc_HUMAN

FIND : /Search  exact / pattern  FILE

locate : simplest only provide location, less powerful

find: robust lot of options ,more powerful

find /home/alamt/  -name file.txt

find /home/alamt/  -name file*.txt

Shell multi line comment using vi editor

Comment from line 50 to 100


Uncomment from line 50 to 100


4. WGET to download files recursively

cd /projects/dragon/FANTOM5/
wget -r --user myid  --password mypass --force-html -i

5. Background and foreground process management

NOHUP:  Linux process management in foreground background and background using nohup

How to run a job in background
nohup /path/to/your/ > /dev/null 2>&1 &

How to view background process

How to move job from foreground to background and vice versa

fg %jobid

bg %jobid

How to kill background jobs
kill -19 %job_id

RSYNC ( copy from loacal/remote to local/remote with synchronization)

rsync -avH --progress  alamt@     /destfolder/


  • Keep in mind that screen's default command character is Ctrl+a (press the Ctrl key, hold it and press a, then release them both  =====> Ctrl-a). 
  • Moreover, the command (letter) entered after Ctrl+a is case sensitive, so for example Ctrl+a n is a different command from Ctrl+a N.

1. open
OR screen -S screenname

2. list of screen
screen -ls

3. Enter into a screen

screen -r screenID

4. De-attach screen
ctrl-a +d

5. Kill permanenelty
ctrl-a + K 


screen -X -S screenname kill  (tested killing)

6. Create new terminal/shell under a screen
if you work is some software, you have to create different terminal in a screen.

// to create new command terminal
ctrl-a c

// shift among terminals. 10 terminals can be made ( 0 - 9)

ctrl-a  SHIFT ( 0 ==> 1)
ctrl-a  SHIFT ( 1 ==> 2)
ctrl-a  SHIFT ( 2 ==> 3)

ctrl-a  SHIFT ( 8 ==> 9)
ctrl-a  SHIFT ( 9 ==> 0)

// Detach from terminal and scree

go to a terminal where no software is running. it is on basic terminal.

Then ctrl-a d   to deattach from screen.

6. System related commands: Linux Processor and memory information

To get the number of processor

less /proc/cpuinfo | grep processor

To get the details of each processor and number of core
# very detailed information
less /proc/cpuinfo | less

# summary information how many core/processor etc


To get the RAM size

1. In human readable format

free -g
free -m

2. In more details

less /proc/meminfo 

To get the disk size

1. In human readable format

df -h

Total size of a folder
Enter into folder and then
du -ch | grep total

Sunday, May 13, 2012

Java string manipulation


                strLine = brAllrna.readLine(); // A

                StringTokenizer stringTokenizer = new StringTokenizer(strLine, " \t");
                Vector pwmA = new Vector();
                while (stringTokenizer.hasMoreElements()) {

                    Double val = Double.parseDouble(stringTokenizer.nextElement().toString() );