How do I install these tools into a VM? Do I need to set up a custom ubuntu VM or will it work on a GVL (where I can use other tools that are already on there)? Do I need to download installation files locally and transfer them or is there a way to install directly from the source website onto the VM?
Sorry I’ve taken so long to get back to you - I wanted to write you a thorough and tested guide for how to go about this.
I think that because none of this software appears to be pre-installed on GVL, or available through the brew package manager, the best option is to launch a custom VM, following the tutorial to generate and use an SSH key to access it: Generating an SSH key on a Mac or on Windows.
I’ve created the following list of commands that you should just be able to copy and paste to install the software that you linked (I tested the following commands using Ubuntu 16.04). I’ve also tried to comment these commands (with #) to indicate what’s happening at each stage.
## System packages
# Upgrade system packages
sudo apt-get update && sudo apt-get upgrade # Answer "y" when prompted
#Install system tools required for building the informatics software below
sudo apt-get install build-essential autoconf unzip python-minimal python-biopython python-pip # Answer "y" when prompted
## Metabat/Metabat2
# Download binaries
wget https://bitbucket.org/berkeleylab/metabat/downloads/metabat-static-binary-linux-x64_v2.12.1.tar.gz
# Unpack archive
tar xzvf metabat-static-binary-linux-x64_v2.12.1.tar.gz
# Rename directory to contain version name
mv metabat metabat-2.21.1
# Move directory to another directory on the PATH (explanation: http://www.linfo.org/path_env_var.html)
sudo mv metabat-2.21.1/ /usr/local/bin/
# Find all executable files in the metabat directory and link them onto the PATH
find /usr/local/bin/metabat-2.21.1/ -mindepth 1 -executable -exec sudo ln -s {} /usr/local/bin/ \;
## MaxBin
# Download archive
wget 'https://downloads.sourceforge.net/project/maxbin/MaxBin-2.2.4.tar.gz'
# Unpack archive
tar xvzf MaxBin-2.2.4.tar.gz
# Move directory to PATH
sudo mv MaxBin-2.2.4 /usr/local/bin/
# Compile the MaxBin executable (details in /usr/local/bin/MaxBin-2.2.4/README.txt)
cd /usr/local/bin/MaxBin-2.2.4/src
sudo make
# Download and install MaxBin dependencies using the bundled script (details in /usr/local/bin/MaxBin-2.2.4/README.txt)
cd /usr/local/bin/MaxBin-2.2.4
sudo ./autobuild_auxiliary # Answer "yes" at the CPAN prompt
# Make sure the ubuntu user is the owner of the dependency directory
sudo chown -R ubuntu:ubuntu /usr/local/bin/MaxBin-2.2.4/auxiliary
# Patch run_MaxBin.pl so that we can link it to the PATH with the next command
sudo sed -i 's/$Bin/$RealBin/g' /usr/local/bin/MaxBin-2.2.4/run_MaxBin.pl
# Link run_MaxBin.pl to PATH
sudo ln -s /usr/local/bin/MaxBin-2.2.4/run_MaxBin.pl /usr/local/bin/
## CheckM
# Prodigal (CheckM dependency)
# Download prodigal binary and rename to "prodigal"
wget https://github.com/hyattpd/Prodigal/releases/download/v2.6.3/prodigal.linux -O prodigal
# Move it to PATH directory
sudo mv prodigal /usr/local/bin
# Make /usr/local/bin/prodigal executable
sudo chmod +x /usr/local/bin/prodigal
# pplacer (CheckM dependency)
# Download zip archive
wget https://github.com/matsen/pplacer/releases/download/v1.1.alpha19/pplacer-linux-v1.1.alpha19.zip
# Extract archive
unzip pplacer-linux-v1.1.alpha19.zip
# Install the python scripts bundled with pplacer
cd pplacer-Linux-v1.1.alpha19/scripts
sudo python setup.py install
# Move the pplacer directory to the PATH
cd ../../
sudo mv pplacer-Linux-v1.1.alpha19 /usr/local/bin/
# Find and link executables into PATH directory
find /usr/local/bin/pplacer-Linux-v1.1.alpha19/ -mindepth 1 -maxdepth 1 -type f -exec sudo ln -s {} /usr/local/bin/ \;
# HMMER (CheckM dependency)
# THIS IS A CHEAT BECAUSE ITS ALREADY BUILT BY MAXBIN - JUST COPY THE EXECUTABLES INTO PATH DIRECTORY
find /usr/local/bin/MaxBin-2.2.4/auxiliary/hmmer-3.1b1/src -mindepth 1 -type f -executable -exec sudo cp {} /usr/local/bin/ \;
#CheckM
# Make directory and download the checkm database
sudo mkdir /opt/checkm_data
wget https://data.ace.uq.edu.au/public/CheckM_databases/checkm_data_2015_01_16.tar.gz
# Unpack the database into the newly created directory
sudo tar xvzf checkm_data_2015_01_16.tar.gz -C /opt/checkm_data/
# Make sure everyone can read the database
sudo chmod -R 0755 /opt/checkm_data
# Install checkm with pip
sudo pip install checkm-genome
# Point checkm at the database we downloaded and unpacked
sudo checkm data setRoot # answer "/opt/checkm_data/" when prompted
Hopefully that gives a step-by-step indication of how to install software on a fresh VM. You can see that the process is mostly similar for all of the packages:
Check software manual webpage for instructions
Download software from the internet.
Compile or setup if needed.
Move and to a directory on the PATH so that you can execute the program from anywhere on your VM.
Please let me know if something doesn’t work, or a step is unclear.
I have created a custom VM as a climb.group flavor ( I hope to share this instance with a few others later this week as a quick demo for my analyses). I created a new SSH key for this and pasted the public key into the website box so that I could launch this server, which I now have. I copied the key to the ~/.ssh folder on my mac, and did chmod 600. Then I went to ssh to this server on my Terminal by typing
ssh ubuntu@
I got the following error message (I replaced IP address and ECDSA key fingerprint with *):
The authenticity of host '...** (...)’ can’t be established.
ECDSA key fingerprint is ****************************.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '...’ (ECDSA) to the list of known hosts.
ubuntu@...***: Permission denied (publickey).
What am I not doing?
On another note, I want to create a volume for my data to attach to this instance, and I see that do this I need to log into Horizon. A username and password were generated on Bryn in launching this instance, but when I follow the hyperlink to Horizon I am asked for a domain - what should I put here??
Thanks I’m now in. But Horizon doesn’t let me leave the Domain field blank: “This field is required.” Am I meant to be using the username and password generated when I launch my instance?
Working through software installation commands that you put together, and I ran into this issue (note I am working in a different custom group vm than I mentioned before, hence different name…):
ubuntu@metagenomics-standys:~$ mv metabat metabat-2.21.1
ubuntu@metagenomics-standys:~$ sudo mv metabat-2.21.1/ /usr/local/bin/
sudo: unable to resolve host metagenomics-standys
ubuntu@metagenomics-standys:~$ sudo mv metabat-2.21.1/ /usr/local/bin/
sudo: unable to resolve host metagenomics-standys
mv: cannot stat ‘metabat-2.21.1/’: No such file or directory
sudo: unable to resolve host metagenomics-standys is just a warning, the mv command completed successfully, which is why it fails the second time with No such file or directory - the directory you are trying to move has already been moved to /usr/local/bin.
Following your commands and am trying to extract the archive for CheckM after downloading the zip archive, but I don;t think the archive downloaded because I get the error:
unzip: cannot find or open pplacer-Linux-v1.1.alpha17.zip, pplacer-Linux-v1.1.alpha17.zip.zip or pplacer-Linux-v1.1.alpha17.zip.ZIP.
Full sequence of commands and responses on this bit:
Another error when trying to unzip checkm_data_2015_01_16.tar.gz, maybe an informational one? (I can’t tell…)
ubuntu@metagenomics-standys:/usr/local/bin/MaxBin-2.2.4$ sudo tar xvzf /opt/checkm_data/checkm_data_2015_01_16.tar.gz -C /opt/checkm_data/
sudo: unable to resolve host metagenomics-standys
tar (child): /opt/checkm_data/checkm_data_2015_01_16.tar.gz: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
The ‘$chmod -R 0755 /opt/chekm_data’ returned a bunch of ‘chmod: changing permissions of…’, and all had ‘Operation not permitted’ printed at the end of every line. Is this an issue?
OK I made it to the end thanks! Really appreciate your help with this.
Just one more question on this side of things - where in this vm should I be running the commands or does that not matter? For most of the above I have been in /usr/local/bin.MaxBin-2.2.4 (as per the order of commands).
My next obstacle is to run some analyses on my data via an attached volume - I have put that up as a new topic (as if you weren’t already fed up of my questions…!)