- Introduction lecture to WGS for Molecular Epidemiology
- Introduction to CLIMB (Cloud Infrastucture for MIcrobial Bioinformatics) as a platform for WGS analysis.
- Launching a CLIMB VM.
- Using RStudio in a VM.
- Example of a bioinformatics application - using LUbuntu to run Artemis genome browser.
- Introduction to EDGE.
- Example of uploading and running a sample on EDGE.
- Anthrax on the Subway Example of source attribution.
- Analysis of the results of the EDGE and self-directed example workflow.
1. Bryn: The CLIMB portal
You can sign-up to CLIMB via bryn.climb.ac.uk but please note that the first user should be a principal investigator or independent investigator.
NOTE: A CLIMB account has been set up by OCHRU for the course. Details will be shared on the day.
2. Launching a GVL server
The Genomics Virtual Laboratory is our standard ‘image’.
Launching a GVL server:
RStudio is an online development environment for running R code:
RStudio Server provides access to RStudio and, by extension, R from within your browser.
Bring up GVL
Click on Rstudio
Input your log-in credentials. Your username will be ‘researcher’ and the password will be the same as the one you specified when you made the GVL instance.
This should bring up an Rstudio interface.
From this interface you should be able to use R in a way that many of your will be familiar with.
Cars Plotting Example
Run these commands one-by-one. They will produce a number of different plots from the stock dataset cars.
The plots are produced by ggplot2 a powerful tool for plotting that uses the grammar of graphics.
This allows you to start with your base dataset and add plots layer by layer.
# Load the library library(ggplot2) # Line plot ggplot(cars, aes(speed, dist))+ geom_line() # Barchart ggplot(cars, aes(speed, dist))+ geom_bar(stat="identity") # Line plot on top of the bar plot ggplot(cars, aes(speed, dist))+ geom_bar(stat="identity") + geom_line()
Some more advanced plotting using qplot (This tutorial was taken from http://www.statmethods.net/advgraphs/ggplot2.html):
# ggplot2 examples library(ggplot2) # create factors with value labels mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("Automatic","Manual")) mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl")) # Kernel density plots for mpg # grouped by number of gears (indicated by color) qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5), main="Distribution of Gas Milage", xlab="Miles Per Gallon", ylab="Density") # Scatterplot of mpg vs. hp for each combination of gears and cylinders # in each facet, transmittion type is represented by shape and color qplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon") # Separate regressions of mpg on weight for each number of cylinders qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, main="Regression of MPG on Weight", xlab="Weight", ylab="Miles per Gallon") # Boxplots of mpg by number of gears # observations (points) are overlayed and jittered qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), fill=gear, main="Mileage by Gear Number", xlab="", ylab="Miles per Gallon")
4. VNC Virtual Desktop
During this tutorial you will open a virual desktop to your instance and visualise a genome using Artemis Genome Browser.
Click the VNC link on the GVL homepage.
Select ubuntu from as the username from the drop-down box.
Input your GVL instance password to log in.
Load a Terminal window (Start > Accessories > LXTerminal) or click on the terminal on the desktop.
Run Artemis by typing in the command
artand pressing return in the browser.
Staphylococcus aureus example:
Find Staphylococcus aureus subsp. aureus N315 and make a note of the accession number in the 4th column (BA000018)
Load the genome into Artemis - File>Open from EBI - dbfetch
Type in ‘BA000018’
Go to -> Navigator -> Go to feature with gene name: mecA
What information does artemis give us on this gene / region?
· Nucleotide sequence
· Amino acid sequence
· Gene order and orientation
· Size of genes
· Promoter regions / ribosome binding sites?
Click ‘Graph’ -> Toggle ‘GC Content %’
This will show the GC content of the region.
Zoom out and observe the difference in GC content in SCCmec.
Try finding your favourite gene in your preferred organism.