Install software from the web directly onto GVL?

Apologies if this is an obvious thing to do, but I am a relative newby to command line bioinformatics and a climb newby too…

I want to use a GVL to analyse a metagenome that I have assembled de novo. The tools I would like to use are not on the climb system, e.g. MetaBAT (https://bitbucket.org/berkeleylab/metabat/src/master/) and MaxBin (https://sourceforge.net/projects/maxbin/) binning tools, and CheckM (http://ecogenomics.github.io/CheckM/) genome bin quality assessment.

How do I install these tools into a VM? Do I need to set up a custom ubuntu VM or will it work on a GVL (where I can use other tools that are already on there)? Do I need to download installation files locally and transfer them or is there a way to install directly from the source website onto the VM?

Thanks in advance for your help!

Sorry I’ve taken so long to get back to you - I wanted to write you a thorough and tested guide for how to go about this.


I think that because none of this software appears to be pre-installed on GVL, or available through the brew package manager, the best option is to launch a custom VM, following the tutorial to generate and use an SSH key to access it: Generating an SSH key on a Mac or on Windows.

I’ve created the following list of commands that you should just be able to copy and paste to install the software that you linked (I tested the following commands using Ubuntu 16.04). I’ve also tried to comment these commands (with #) to indicate what’s happening at each stage.

## System packages
# Upgrade system packages
sudo apt-get update && sudo apt-get upgrade  # Answer "y" when prompted

#Install system tools required for building the informatics software below
sudo apt-get install build-essential autoconf unzip python-minimal python-biopython python-pip    # Answer "y" when prompted


## Metabat/Metabat2
# Download binaries
wget https://bitbucket.org/berkeleylab/metabat/downloads/metabat-static-binary-linux-x64_v2.12.1.tar.gz
# Unpack archive
tar xzvf metabat-static-binary-linux-x64_v2.12.1.tar.gz
# Rename directory to contain version name
mv metabat metabat-2.21.1
# Move directory to another directory on the PATH (explanation: http://www.linfo.org/path_env_var.html)
sudo mv metabat-2.21.1/ /usr/local/bin/
# Find all executable files in the metabat directory and link them onto the PATH
find /usr/local/bin/metabat-2.21.1/ -mindepth 1 -executable -exec sudo ln -s {} /usr/local/bin/ \;


## MaxBin
# Download archive
wget 'https://downloads.sourceforge.net/project/maxbin/MaxBin-2.2.4.tar.gz'
# Unpack archive
tar xvzf MaxBin-2.2.4.tar.gz
# Move directory to PATH
sudo mv MaxBin-2.2.4 /usr/local/bin/
# Compile the MaxBin executable (details in /usr/local/bin/MaxBin-2.2.4/README.txt)
cd /usr/local/bin/MaxBin-2.2.4/src 
sudo make
# Download and install MaxBin dependencies using the bundled script (details in /usr/local/bin/MaxBin-2.2.4/README.txt)
cd /usr/local/bin/MaxBin-2.2.4
sudo ./autobuild_auxiliary   # Answer "yes" at the CPAN prompt
# Make sure the ubuntu user is the owner of the dependency directory
sudo chown -R ubuntu:ubuntu /usr/local/bin/MaxBin-2.2.4/auxiliary
# Patch run_MaxBin.pl so that we can link it to the PATH with the next command
sudo sed -i 's/$Bin/$RealBin/g' /usr/local/bin/MaxBin-2.2.4/run_MaxBin.pl
# Link run_MaxBin.pl to PATH
sudo ln -s /usr/local/bin/MaxBin-2.2.4/run_MaxBin.pl /usr/local/bin/


## CheckM
# Prodigal (CheckM dependency)
# Download prodigal binary and rename to "prodigal"
wget https://github.com/hyattpd/Prodigal/releases/download/v2.6.3/prodigal.linux -O prodigal
# Move it to PATH directory
sudo mv prodigal /usr/local/bin
# Make /usr/local/bin/prodigal executable
sudo chmod +x /usr/local/bin/prodigal

# pplacer (CheckM dependency)
# Download zip archive
wget https://github.com/matsen/pplacer/releases/download/v1.1.alpha19/pplacer-linux-v1.1.alpha19.zip
# Extract archive
unzip pplacer-linux-v1.1.alpha19.zip
# Install the python scripts bundled with pplacer
cd pplacer-Linux-v1.1.alpha19/scripts
sudo python setup.py install
# Move the pplacer directory to the PATH
cd ../../
sudo mv pplacer-Linux-v1.1.alpha19 /usr/local/bin/
# Find and link executables into PATH directory
find /usr/local/bin/pplacer-Linux-v1.1.alpha19/ -mindepth 1 -maxdepth 1 -type f -exec sudo ln -s {} /usr/local/bin/ \;

# HMMER (CheckM dependency)
# THIS IS A CHEAT BECAUSE ITS ALREADY BUILT BY MAXBIN - JUST COPY THE EXECUTABLES INTO PATH DIRECTORY
find /usr/local/bin/MaxBin-2.2.4/auxiliary/hmmer-3.1b1/src -mindepth 1 -type f -executable -exec sudo cp {} /usr/local/bin/ \;

#CheckM
# Make directory and download the checkm database
sudo mkdir /opt/checkm_data
wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
# Unpack the database into the newly created directory
sudo tar xvzf checkm_data_2015_01_16.tar.gz -C /opt/checkm_data/
# Make sure everyone can read the database
sudo chmod -R 0755 /opt/checkm_data
# Install checkm with pip
sudo pip install checkm-genome
# Point checkm at the database we downloaded and unpacked
sudo checkm data setRoot # answer "/opt/checkm_data/" when prompted

Hopefully that gives a step-by-step indication of how to install software on a fresh VM. You can see that the process is mostly similar for all of the packages:

  1. Check software manual webpage for instructions
  2. Download software from the internet.
  3. Compile or setup if needed.
  4. Move and to a directory on the PATH so that you can execute the program from anywhere on your VM.

Please let me know if something doesn’t work, or a step is unclear.

Hi Matt,

Thanks very much for putting this together for me, it’s much appreciated. I will work through it today/tomorrow and let you know if I get stuck.

Thanks!

Hi Matt,

OK so I am stuck at the first hurdle :confused:

I have created a custom VM as a climb.group flavor ( I hope to share this instance with a few others later this week as a quick demo for my analyses). I created a new SSH key for this and pasted the public key into the website box so that I could launch this server, which I now have. I copied the key to the ~/.ssh folder on my mac, and did chmod 600. Then I went to ssh to this server on my Terminal by typing

ssh ubuntu@

I got the following error message (I replaced IP address and ECDSA key fingerprint with *):

The authenticity of host '...** (...)’ can’t be established.
ECDSA key fingerprint is ****************************.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '
.
..’ (ECDSA) to the list of known hosts.
ubuntu@..
.***: Permission denied (publickey).

What am I not doing?

On another note, I want to create a volume for my data to attach to this instance, and I see that do this I need to log into Horizon. A username and password were generated on Bryn in launching this instance, but when I follow the hyperlink to Horizon I am asked for a domain - what should I put here??

Thanks in advance!

Not sure why it didn’t show up on my reply but my ssh command did have the IP address after the @…!

If you called your SSH key anything different to the default, you need to tell ssh where it is using the -i flag.

ssh -i ~/.ssh/your_key_name.pem ubuntu@<ip_address>


You can leave this blank!

Thanks I’m now in. But Horizon doesn’t let me leave the Domain field blank: “This field is required.” Am I meant to be using the username and password generated when I launch my instance?

Whoops, my mistake, please use “Default” as the domain.

You are provided the remaining credentials for Horizon (advanced interface) when you expand the dropdown on https://bryn.climb.ac.uk.

Thank you, will make a note of all this so I don’t forget!

Working through software installation commands that you put together, and I ran into this issue (note I am working in a different custom group vm than I mentioned before, hence different name…):

ubuntu@metagenomics-standys:~$ mv metabat metabat-2.21.1
ubuntu@metagenomics-standys:~$ sudo mv metabat-2.21.1/ /usr/local/bin/
sudo: unable to resolve host metagenomics-standys
ubuntu@metagenomics-standys:~$ sudo mv metabat-2.21.1/ /usr/local/bin/
sudo: unable to resolve host metagenomics-standys
mv: cannot stat ‘metabat-2.21.1/’: No such file or directory

Thoughts?

sudo: unable to resolve host metagenomics-standys is just a warning, the mv command completed successfully, which is why it fails the second time with No such file or directory - the directory you are trying to move has already been moved to /usr/local/bin.

Continue!

Aaaah ok thanks! Another thing I will make a note of…

Following your commands and am trying to extract the archive for CheckM after downloading the zip archive, but I don;t think the archive downloaded because I get the error:

unzip: cannot find or open pplacer-Linux-v1.1.alpha17.zip, pplacer-Linux-v1.1.alpha17.zip.zip or pplacer-Linux-v1.1.alpha17.zip.ZIP.

Full sequence of commands and responses on this bit:

ubuntu@metagenomics-standys:/usr/local/bin/MaxBin-2.2.4$ wget https://github.com/matsen/pplacer/releases/download/v1.1.alpha19/pplacer-linux-v1.1.alpha19.zip
–2018-05-08 12:07:04-- https://github.com/matsen/pplacer/releases/download/v1.1.alpha19/pplacer-linux-v1.1.alpha19.zip
Resolving github.com (github.com)… 192.30.253.112, 192.30.253.113
Connecting to github.com (github.com)|192.30.253.112|:443… connected.
HTTP request sent, awaiting response… 302 Found
Location: https://github-production-release-asset-2e65be.s3.amazonaws.com/271184/a143b512-ce97-11e6-9df6-0adc940068e3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20180508%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180508T120730Z&X-Amz-Expires=300&X-Amz-Signature=8ee76fccf5e35959419021c05fe57da8f0e9e1617ca82d17597c8d2bf3a9b94b&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dpplacer-linux-v1.1.alpha19.zip&response-content-type=application%2Foctet-stream [following]
–2018-05-08 12:07:04-- https://github-production-release-asset-2e65be.s3.amazonaws.com/271184/a143b512-ce97-11e6-9df6-0adc940068e3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20180508%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20180508T120730Z&X-Amz-Expires=300&X-Amz-Signature=8ee76fccf5e35959419021c05fe57da8f0e9e1617ca82d17597c8d2bf3a9b94b&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Dpplacer-linux-v1.1.alpha19.zip&response-content-type=application%2Foctet-stream
Resolving github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)… 54.231.49.104
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|54.231.49.104|:443… connected.
Unable to establish SSL connection.
ubuntu@metagenomics-standys:/usr/local/bin/MaxBin-2.2.4$ unzip pplacer-Linux-v1.1.alpha17.zip
unzip: cannot find or open pplacer-Linux-v1.1.alpha17.zip, pplacer-Linux-v1.1.alpha17.zip.zip or pplacer-Linux-v1.1.alpha17.zip.ZIP.
ubuntu@metagenomics-standys:/usr/local/bin/MaxBin-2.2.4$ ls
autobuild_auxiliary buildapp COPYRIGHT _getmarker.pl LICENSE README.txt _sepReads.pl src
bacar_marker.hmm ChangeLog _getabund.pl heatmap.r marker.hmm run_MaxBin.pl setting
ubuntu@metagenomics-standys:/usr/local/bin/MaxBin-2.2.4$

Two mistakes already - I’m so sorry!

I’ve updated the original post, and you should be able to continue from (note the alpha19, not alpha17):

unzip pplacer-linux-v1.1.alpha19.zip

Thanks!

Another error when trying to unzip checkm_data_2015_01_16.tar.gz, maybe an informational one? (I can’t tell…)

ubuntu@metagenomics-standys:/usr/local/bin/MaxBin-2.2.4$ sudo tar xvzf /opt/checkm_data/checkm_data_2015_01_16.tar.gz -C /opt/checkm_data/
sudo: unable to resolve host metagenomics-standys
tar (child): /opt/checkm_data/checkm_data_2015_01_16.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now

Another mistake on my part :roll_eyes:.

I’ve fixed this in the original post.

Thank you!

The ‘$chmod -R 0755 /opt/chekm_data’ returned a bunch of ‘chmod: changing permissions of…’, and all had ‘Operation not permitted’ printed at the end of every line. Is this an issue?

sudo chmod -R 0755 /opt/checkm_data

Is the correct command - again, updated in the OP.

OK I made it to the end thanks! Really appreciate your help with this.

Just one more question on this side of things - where in this vm should I be running the commands or does that not matter? For most of the above I have been in /usr/local/bin.MaxBin-2.2.4 (as per the order of commands).

My next obstacle is to run some analyses on my data via an attached volume - I have put that up as a new topic (as if you weren’t already fed up of my questions…!)

No problem - sorry that this didn’t work out of the box!

It shouldn’t matter where exactly those commands were run from - full, explicit paths given as much as possible.

Please test all of the installed software, following the instructions from their user guides and manuals.