vSphere bug with DRS, StandBy and non-persistent hard drives

We’ve been in touch with VMware recently about an issue we were experiencing in vSphere 4, where machines in standby could not be powered on. VMware have now confirmed that this is a bug, and that there will be a fix in R2.

While it’s fairly specific to our use-case, I thought I’d share the details in case anyone else runs into this.

First of all, this bug will only affect you if the following conditions are met:

  • You are using VMware vSphere 4.0 (or 4.0 Update 1)
  • Guest OS power-saving settings cause the virtual machine to enter standby
  • One or more of the guest’s hard drives are set to “independent non-persistent”
  • DRS is enabled on the virtual machine’s cluster

The machine enters standby as normal. The issue arises when you try to power the virtual machine back on: if DRS has allocated the machine to another host based on load the machine will not resume, and gives an error similar to the following:-

“Virtual Machine is configured to use a device that prevents the operation: Device ‘Hard disk 1’ is disk which is not in persistent mode. Device ‘Hard disk 1’ which is not in persistent mode”.

You cannot manually migrate the machine (even back to the original host). You cannot change the power-state on the machine, edit the virtual machine settings, or delete the machine.

If this has happened to you, the only way we’ve found to get the machine back up-and-running seems to be to remove the machine from inventory, then create a new virtual machine with the same specifications, and add the old machine’s VMDK.

Fortunately, there are a couple of workarounds. You can either disable power-saving settings in the guest OS or change the guest power management settings from “Suspend the virtual machine” to “Put the guest OS into standby mode and leave the virtual machine powered on” (you can automate this as described in my previous post).

Changing the guest power-management settings means that when the guest enters standby, although vSphere shows the machine as “powered-on”, VMware Tools is not running, which can cause problems (i.e., when trying to gracefully shut down a batch of machines).

This was also my first time working with the VMware vSphere support and I was impressed. They quickly replicated the problem and confirmed that it was indeed a bug. As most people nowadays tend to use snapshots rather than non-persistent drives, and few users virtualise desktop operating systems (which are more likely to have power-saving settings on by default) I can understand why this particular set of circumstances went untested.