Beginner's Unix

Length of tutorial: 3hr
Difficulty: Moderate


  • Introduce complete beginners to Unix
  • Explain the following:
  • General navigation on the command line
  • Finding things
  • Extracting information

When we are familiar with interacting with computers with a mouse (point and click), the notion of doing things on the command line can seem complex and counter intuitive. In bioinformatics the difference between using the command line or a graphical interface can be in the order of weeks or even months.

Tutorial 1

Go to this link and signup for this tutorial, it’s a really nice, interactive introduction to Linux:

Tutorial 2

You now need to sign into your terminal:

For Mac users:


For Windows users:

You need to follow the putty steps from earlier today!

Tutorial 2a

You can now have a go at finding stuff, go to this link and follow the tutorial (haiku.txt is in your home directory):

Follow the tutorial, trying the commands in your terminal window.

Once you get to the bit about species etc. then move onto the next tutorial.

P.S If you have the inclination the whole tutorial is great (go to

Tutorial 2b

Now, with your new found Linux skills go to the exercises_grep_awk directory like this:

cd /opt/exercises_grep_awk/

Now you have to find all lines in the file data/exercise1_grep.txt that contain the word start and follow the instructions within the tutorial!
clue: try using grep

Once you have finished that exercise try this

head data/genes/chr8.gff

As you can see it is a tab-separated file, which we could easily read in Excel or Calc.

The format specifications are defined here, but in short:

  • The first, fourth and fifth columns contain the chromosome name and coordinates
  • The second column describes the tool or resource that generated the annotation
  • The third column describe the type of feature (e.g. gene, transcript, exon, TF binding site, Histone Acetylation mark, etc…
  • The ninth column contains several fields, separated by a semicolon

Can you print all the lines between 5000000 and 10000000?

Try this command, what is it doing?

awk '{print $1, $5-$4, $9}' data/genes/chr8.gff | grep -v '^#' |  head

Now try this command, and figure out what it is doing

awk '$9 ~ /symbol=MIR/ {print $0}' data/genes/chr8.gff

By using a modification of the last commands can you calculate the length of the gene POU5F1B?

ANSWER will appear after discussion