Running Roary on CLIMB

ivaatanas · March 8, 2017, 3:24pm

Hello! I am trying to run a set of around 1000 gff files and I have no idea how to set up Roary on CLIMB. Can somebody please explain to me what is the best way to go, or at least direct me to some tutorials? Thank you!

mattbull · March 8, 2017, 3:55pm

Roary is pre-installed on GVL instances, so I recommend that you use a GVL instance for this.

Once you’ve launched your GVL instance, and before you start your analysis, you might want to upgrade Roary to the latest version using linuxbrew, by running

brew upgrade roary

You can then check the Roary documentation here: https://sanger-pathogens.github.io/Roary/.

Initially, it should just be as simple as running

roary -h

to see the command usage help, and piecing together the command that you’d like to run based on the documentation and the command usage help.

As you already have annotated sequences in .gff format, there’s not much more preparation that you need to do before running the pan genome analysis. Use scp to transfer your .gff files to your GVL instance, either using the unix terminal or Windows software like WinSCP or MobaXterm, then from the directory you’ve uploaded to, use

roary *.gff

to run the pan genome analysis including every .gff file in the current directory.

ivaatanas · March 8, 2017, 4:30pm

Oh, thank you very much, I did not realize that Roary is pre-installed on GVL! I hope it will work. Thank you again.

ivaatanas · March 9, 2017, 10:58am

Dear Matt,

I tried doing a MAFTT alignment with Roary overnight and my run has crashed with this error: write failed - broken pipeline. As far as I understand, this means that the shh connection got broken? Or there is something wrong with Roary (I checked, the installation is up to date)? Do you have any idea how to prevent this. I think that my run might take around 10 hours, since I have a lot of genomes. Thank you!

mattbull · March 9, 2017, 11:37am

Glad you’ve got Roary running!

Your disconnection problem is a common one - wanting to (or being forced to) disconnect from your remote server while something is still running, causing the program to end.

There are a couple of options for getting around this…

1. Use nohup

This tells the program you are running to ignore the hangup (logout) signal and can be used like this:

nohup [your command here] &

You’ll get a message about output redirection, then you can disconnect from your remote server (Ctrl+D), reconnect, and the program will still be running (good news!). Any terminal output from the program gets redirected to a file called nohup.out in the current directory. Be sure to read the program log output to make sure everything went to plan!

2. Use tmux (or GNU screen)

tmux is a terminal multiplexer, which allows you to start new terminals then detach from them, leaving a program running in the detached terminal. You can reattach later.

You can install tmux on GVL using (if it’s not already there):

sudo apt-get install tmux

There’s lots of good beginner’s guides to tmux so I won’t get too far into it here, but simply:

Start a new tmux session
tmux
Detach from a tmux session
Ctrl+b, d
List running tmux sessions when detached from all sessions
tmux ls
Reattach to running tmux session
tmux attach -t [session number]

You can also do nifty things, like splitting your current session into panes and running multiple windows in the same session - check the guide above and the manual (man tmux).

ivaatanas · March 9, 2017, 11:57am

Thank you very much on such a fast reply! I hope that this will solve the problem :))

ivaatanas · March 9, 2017, 5:47pm

Dear Matt,

We are also having some problems with mounting a volume in a directory in which we are trying to run Roary. Because Roary makes big temporary files, we need more than 120 G. If I mount a new volume according to the tutorial (sudo mount /dev/vdc1 my_dir), roary cannot run in my_dir. It works if I star roary in a directory, and than I mount after starting the run, but in that case I cannot see any of my files (in my_dir I can only see lost+found which I cannot access). I have a strong feeling that I am doing something wrong and that there is a simple solution to this. Do you maybe know how can I solve the problem?

Thank you!

mattbull · March 9, 2017, 8:27pm

Okay, couple of things to check:

1. That the volume is actually mounted where you think it is

Use the lsblk command to list all block devices (volumes) and where they’re currently mounted. If this is the first volume that you’ve attached to your instance, its unlikely to be listed as /dev/vdc, so check this thoroughly and paste the output if you aren’t sure

2. That you have permission to write to the directory containing the mounted volume

If you can’t make a new file in the directory that you mounted your volume at (touch newfile), make sure you changed ownership from the root user to the ubuntu user with
sudo chown ubuntu:ubuntu [mountpoint directory]

Check directory ownership with
ls -l

3. That your new volume isn’t mounted on the directory containing your .gff files

If the directory that you mount your new volume at already contains files, these will be “shadowed” from the system and won’t be visible until you unmount the volume. The way to work around this is to mount your new volume at an empty directory, then use mv to move your .gff files into this new directory.

ivaatanas · March 10, 2017, 12:28pm

Thank you Matt, I had no idea that I have to change ownership after mounting. Just to make sure that I did not mess anything up, I will paste the outputs of the lsblk and ls -l. Could you please take a look? The volume that I am mounting is listed as /dev/vdb and my working directory is called Iva.

MOUNT_CHECK

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 120G 0 disk
└─vda1 253:1 0 120G 0 part /
vdb 253:16 0 1000G 0 disk /home/ubuntu/Iva
loop0 7:0 0 100G 0 loop
└─docker-253:1-269678-pool (dm-0) 252:0 0 100G 0 dm
loop1 7:1 0 2G 0 loop
└─docker-253:1-269678-pool (dm-0) 252:0 0 100G 0 dm

OWNERSHIP_CHECK

total 8692872
-rw-rw-r-- 1 ubuntu ubuntu 9292676 Mar 10 11:44 ISOLATE_101131.gff

… all the gff-s (looks the same, the list is too big to paste)

-rw-rw-r-- 1 ubuntu ubuntu 10065496 Mar 10 11:47 ISOLATE_99693.gff
drwx------ 2 root root 16384 Mar 10 11:26 lost+found
drwxrwxr-x 2 ubuntu ubuntu 4096 Mar 10 12:04 test

mattbull · March 10, 2017, 12:53pm

That all looks like it will work!

As long as your .gff files are in the /home/ubuntu/Iva directory, which they appear to be, you should be sorted.

You could test that Roary runs to completion on a subset (say 10 or so) of your .gff files, if you like. The easiest way would be to make a new directory in the /home/ubuntu/Iva directory, copy 10 .gffs into it, and then run the Roary command that you intend to run on your larger dataset. That could save you some time for testing purposes…

ivaatanas · March 10, 2017, 12:57pm

Thank you! Yes, I had the same idea and I did the check - it all worked fine. So hopefuly this time it will not crash