CLIMB Vietnam
Course Outline:
Day1
Morning:
- Introduction lecture to WGS for Molecular Epidemiology
- Introduction to CLIMB (Cloud Infrastucture for MIcrobial Bioinformatics) as a platform for WGS analysis.
Afternoon:
- Launching a CLIMB VM.
- Using RStudio in a VM.
- Example of a bioinformatics application - using LUbuntu to run Artemis genome browser.
Day 2
Morning:
- Introduction to EDGE.
- Example of uploading and running a sample on EDGE.
- Anthrax on the Subway Example of source attribution.
Afternoon:
- Analysis of the results of the EDGE and self-directed example workflow.
1. Bryn: The CLIMB portal
You can sign-up to CLIMB via bryn.climb.ac.uk but please note that the first user should be a principal investigator or independent investigator.
NOTE: A CLIMB account has been set up by OCHRU for the course. Details will be shared on the day.
2. Launching a GVL server
The Genomics Virtual Laboratory is our standard ‘image’.
Launching a GVL server:
3. RStudio
RStudio is an online development environment for running R code:
-
RStudio Server provides access to RStudio and, by extension, R from within your browser.
-
Bring up GVL
-
Click on Rstudio
-
Input your log-in credentials. Your username will be ‘researcher’ and the password will be the same as the one you specified when you made the GVL instance.
-
This should bring up an Rstudio interface.
From this interface you should be able to use R in a way that many of your will be familiar with.
Cars Plotting Example
Run these commands one-by-one. They will produce a number of different plots from the stock dataset cars.
The plots are produced by ggplot2 a powerful tool for plotting that uses the grammar of graphics.
This allows you to start with your base dataset and add plots layer by layer.
# Load the library
library(ggplot2)
# Line plot
ggplot(cars, aes(speed, dist))+ geom_line()
# Barchart
ggplot(cars, aes(speed, dist))+ geom_bar(stat="identity")
# Line plot on top of the bar plot
ggplot(cars, aes(speed, dist))+ geom_bar(stat="identity") + geom_line()
Some more advanced plotting using qplot (This tutorial was taken from http://www.statmethods.net/advgraphs/ggplot2.html):
# ggplot2 examples
library(ggplot2)
# create factors with value labels
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1),
labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
labels=c("4cyl","6cyl","8cyl"))
# Kernel density plots for mpg
# grouped by number of gears (indicated by color)
qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),
main="Distribution of Gas Milage", xlab="Miles Per Gallon",
ylab="Density")
# Scatterplot of mpg vs. hp for each combination of gears and cylinders
# in each facet, transmittion type is represented by shape and color
qplot(hp, mpg, data=mtcars, shape=am, color=am,
facets=gear~cyl, size=I(3),
xlab="Horsepower", ylab="Miles per Gallon")
# Separate regressions of mpg on weight for each number of cylinders
qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"),
method="lm", formula=y~x, color=cyl,
main="Regression of MPG on Weight",
xlab="Weight", ylab="Miles per Gallon")
# Boxplots of mpg by number of gears
# observations (points) are overlayed and jittered
qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"),
fill=gear, main="Mileage by Gear Number",
xlab="", ylab="Miles per Gallon")
4. VNC Virtual Desktop
During this tutorial you will open a virual desktop to your instance and visualise a genome using Artemis Genome Browser.
-
Click the VNC link on the GVL homepage.
-
Select ubuntu from as the username from the drop-down box.
-
Input your GVL instance password to log in.
-
Load a Terminal window (Start > Accessories > LXTerminal) or click on the terminal on the desktop.
-
Run Artemis by typing in the command
art
and pressing return in the browser.
Staphylococcus aureus example:
-
Find Staphylococcus aureus subsp. aureus N315 and make a note of the accession number in the 4th column (BA000018)
-
Load the genome into Artemis - File>Open from EBI - dbfetch
-
Type in ‘BA000018’
-
Go to -> Navigator -> Go to feature with gene name: mecA
What information does artemis give us on this gene / region?
· Nucleotide sequence
· Amino acid sequence
· Gene order and orientation
· Size of genes
· Promoter regions / ribosome binding sites?
-
Click ‘Graph’ -> Toggle ‘GC Content %’
-
This will show the GC content of the region.
-
Zoom out and observe the difference in GC content in SCCmec.
Try finding your favourite gene in your preferred organism.