Slow disk speed when using a Birmingham-based virtual machine

davidryderuk · March 12, 2018, 2:56pm

Hi,
I’ve been using a virtual machine in birmingham which is running Ubuntu 14.04 and hosted on ip address 147.188.173.25. The disk speed seems slow, even when the server is idle. The server has had quite slow disk performance for a number of weeks. The issue seems to affect all of my existing volumes (e.g. vda-vdc). It doesn’t seem to affect newly created volumes (e.g. vdd). I’ve run some tests using hdparm.

sudo hdparm -tT /dev/vda
/dev/vda:
Timing cached reads: 15136 MB in 1.99 seconds = 7595.83 MB/sec
Timing buffered disk reads: 18 MB in 3.02 seconds = 5.95 MB/sec

sudo hdparm -tT /dev/vdb
/dev/vdb:
Timing cached reads: 15022 MB in 1.99 seconds = 7538.13 MB/sec
Timing buffered disk reads: 10 MB in 3.01 seconds = 3.33 MB/sec

sudo hdparm -tT /dev/vdc
/dev/vdc:
Timing cached reads: 13868 MB in 1.99 seconds = 6955.56 MB/sec
Timing buffered disk reads: 20 MB in 3.18 seconds = 6.29 MB/sec

sudo hdparm -tT /dev/vdd
/dev/vdd:
Timing cached reads: 14786 MB in 1.99 seconds = 7417.09 MB/sec
Timing buffered disk reads: 496 MB in 3.00 seconds = 165.08 MB/sec

Thanks,
David

mattbull · March 13, 2018, 3:29pm

Hi David - yeah, that seems slow. I’ll look into what might be causing this and get back to you.

In the meantime, have you tried a new VM? I notice that those disks are all hosted on GPFS at Birmingham, and the newer one (which seems to be okay) is stored on Ceph. All new VMs at Birmingham have their root disks stored on Ceph, so this might make a difference.

davidryderuk · March 13, 2018, 4:15pm

Thanks, I’ll have a go at copying the data across to CEPH. There are also two volumes I can’t retrieve information on or which have got stuck whilst detaching from the host. Any chance the two issues are related?

https://birmingham.climb.ac.uk/dashboard/project/volumes/5d693a78-2b9b-4281-91cc-07e80ed542ad/
https://birmingham.climb.ac.uk/dashboard/project/volumes/88f09e35-d439-46b8-b090-684d00178612/

mattbull · March 14, 2018, 11:25am

Thanks for links, very helpful!

We’ve detached these volumes for you, they were definitely stuck.

I think I can offer a tentative explanation for the poor performance of your GPFS volumes as well - GPFS utilisation at Birmingham is incredibly high, which is affecting read/write speeds of volumes. Your specific manifestation of this problem is caused by a couple of factors:

Because of some system design decisions that were made early on in the project, all VMs at Birmingham were originally launched with GPFS-backed boot volumes. We changed this later on, so that all new VMs were launched with Ceph-backed boot volumes. Your instance and volumes are pretty old, so they have GPFS-backed disks.
The early CLIMB adopters (and users internal to the project, mentioning no names…!) are still extensively using GPFS to store and intensively process large amounts of data. Coupled with reaching design capacity at all sites (200+ VMs/site), GPFS access contention has gone way up since November, when you started to notice problems.

So, my apologies that you’ve experienced this performance degradation. Its annoying and it wasn’t intended on our part.

The best fix that I can offer you at the moment is the one that I described above. Start a new VM (its boot disk will be stored on Ceph), create some Ceph-backed volumes and rsync your data into the new VM. Apologies for the inconvenience of this, but I suspect that you’ll see disk performance return to previous levels.

Please drop me a message if there’s anything I can do to help.

system · March 21, 2018, 11:25am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.