Problems With VM After Failed Snapshot-Based Backup - Unable to Access File Since It Is Locked
Whatever backup solution you use to backup your virtual infrastructure with, you may sometimes end up with VM snapshots that need to be cleaned up. After a backup failure alert, I use the following PowerCLI one-liners to quickly identify and remove snapshots left behind (by say Netapp SMVI).
Get-VM | Get-Snapshot | Where-Object {$\_.Name -like 'smvi\*'} | ft VM,Name,Created -AutoSize
Get-VM | Get-Snapshot | Where-Object {$\_.Name -like 'smvi\*'} | Remove-Snapshot -RunAsync -Confirm:$false
Recently I had an instance where post a backup failure the snapshot failed to remove with the error Unable to communicate with the remote host, since it is disconnected.
From what I could ascertain, following the backup failure the VM had been knocked offline and marked as (Invalid) in vCenter registration. So step 1 was to remove the VM from vCenter and re-add back to the inventory via the right-clicking on the VMX file. Once back in vCenter I was able to remove the snapshot.
There are loads of posts out there on how this can happen if you are using a backup solution that mounts vmdks into the backup appliance and that removing the disk from the backup appliance and retrying should resolve the issue, including this VMware KB article.
I’m not sure if Netapp SMVI even works that way, but I didn’t have any additional disks on the VM used for that purpose. I powered off the SMVI VM anyway, tried the consolidate again, but still no luck.
The VM was now registered on a different host than it had been during the backup failure, but the file lock was still present. I decided to place the original host in maintenance mode in case that was still maintaining a lock on the file. The host failed to get into maintenance mode and hung at 83% for ages after migrating all VMs off. A restart of the ESXi management agents resolved this and I was then able to place the host and maintenance mode and restarted it for good measure.
Following the host restart and with the SMVI VM still powered off, the consolidate was then successful.