CLIMB Course - Vietnam (Day 1)

CLIMB Vietnam

Course Outline:

Day1

Morning:

  • Introduction lecture to WGS for Molecular Epidemiology
  • Introduction to CLIMB (Cloud Infrastucture for MIcrobial Bioinformatics) as a platform for WGS analysis.

Afternoon:

  • Launching a CLIMB VM.
  • Using RStudio in a VM.
  • Example of a bioinformatics application - using LUbuntu to run Artemis genome browser.

Day 2

Morning:

  • Introduction to EDGE.
  • Example of uploading and running a sample on EDGE.
  • Anthrax on the Subway Example of source attribution.

Afternoon:

  • Analysis of the results of the EDGE and self-directed example workflow.

1. Bryn: The CLIMB portal

You can sign-up to CLIMB via bryn.climb.ac.uk but please note that the first user should be a principal investigator or independent investigator.

NOTE: A CLIMB account has been set up by OCHRU for the course. Details will be shared on the day.

2. Launching a GVL server

The Genomics Virtual Laboratory is our standard ‘image’.

Launching a GVL server:

3. RStudio

RStudio is an online development environment for running R code:

  • RStudio Server provides access to RStudio and, by extension, R from within your browser.

  • Bring up GVL

  • Click on Rstudio

  • Input your log-in credentials. Your username will be ‘researcher’ and the password will be the same as the one you specified when you made the GVL instance.

  • This should bring up an Rstudio interface.

From this interface you should be able to use R in a way that many of your will be familiar with.

Cars Plotting Example

Run these commands one-by-one. They will produce a number of different plots from the stock dataset cars.

The plots are produced by ggplot2 a powerful tool for plotting that uses the grammar of graphics.
This allows you to start with your base dataset and add plots layer by layer.


# Load the library
library(ggplot2) 

# Line plot
ggplot(cars, aes(speed, dist))+ geom_line()

# Barchart
ggplot(cars, aes(speed, dist))+ geom_bar(stat="identity")

# Line plot on top of the bar plot
ggplot(cars, aes(speed, dist))+ geom_bar(stat="identity") + geom_line()

Some more advanced plotting using qplot (This tutorial was taken from http://www.statmethods.net/advgraphs/ggplot2.html):

# ggplot2 examples
library(ggplot2) 

# create factors with value labels 
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
  	labels=c("3gears","4gears","5gears")) 
mtcars$am <- factor(mtcars$am,levels=c(0,1),
  	labels=c("Automatic","Manual")) 
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
   labels=c("4cyl","6cyl","8cyl")) 

# Kernel density plots for mpg
# grouped by number of gears (indicated by color)
qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5), 
   main="Distribution of Gas Milage", xlab="Miles Per Gallon", 
   ylab="Density")

# Scatterplot of mpg vs. hp for each combination of gears and cylinders
# in each facet, transmittion type is represented by shape and color
qplot(hp, mpg, data=mtcars, shape=am, color=am, 
   facets=gear~cyl, size=I(3),
   xlab="Horsepower", ylab="Miles per Gallon") 

# Separate regressions of mpg on weight for each number of cylinders
qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), 
   method="lm", formula=y~x, color=cyl, 
   main="Regression of MPG on Weight", 
   xlab="Weight", ylab="Miles per Gallon")

# Boxplots of mpg by number of gears
# observations (points) are overlayed and jittered
qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), 
   fill=gear, main="Mileage by Gear Number",
   xlab="", ylab="Miles per Gallon")

4. VNC Virtual Desktop

During this tutorial you will open a virual desktop to your instance and visualise a genome using Artemis Genome Browser.

  • Click the VNC link on the GVL homepage.

  • Select ubuntu from as the username from the drop-down box.

  • Input your GVL instance password to log in.

  • Load a Terminal window (Start > Accessories > LXTerminal) or click on the terminal on the desktop.

  • Run Artemis by typing in the command art and pressing return in the browser.

Staphylococcus aureus example:

  • Go to: http://www.ebi.ac.uk/genomes/bacteria.html

  • Find Staphylococcus aureus subsp. aureus N315 and make a note of the accession number in the 4th column (BA000018)

  • Load the genome into Artemis - File>Open from EBI - dbfetch

  • Type in ‘BA000018’

  • Go to -> Navigator -> Go to feature with gene name: mecA

What information does artemis give us on this gene / region?
· Nucleotide sequence
· Amino acid sequence
· Gene order and orientation
· Size of genes
· Promoter regions / ribosome binding sites?

  • Click ‘Graph’ -> Toggle ‘GC Content %’

  • This will show the GC content of the region.

  • Zoom out and observe the difference in GC content in SCCmec.

Try finding your favourite gene in your preferred organism.