This month I shall mostly be wearing.....an ESX hat.
During the last month whilst some colleagues have been away on their holidays, I’ve been spending a lot more time with our Virtual Infrastructure than normal and I thought I’d share a few tips I’ve picked up along the way. They are probably no brainers for your hardcore VMware administrators, but for those like me who aren’t 100% dedicated to one specific area, and particularly have been long time Windows admins, there’s nothing like getting really involved with something for a few weeks to really get to grips with how this stuff works.
The kind of tasks and troubleshooting you carry out in the VI client are pretty intuitive, but when you have to delve a bit deeper and venture into the Service Console of ESX its time to thank that well spent time learning PowerShell, the PowerCLI toolkit and generally being proficient with command line based troubleshooting.
1) Kill a VM which won’t power off
Following attempts to shutdown a virtual machine and then use the Power Off button, it would not power off at all. Use the Service Console commands:
vm-support -x (to get the ID of VM’s on the host) and
vm-support -X to kill the VM in question.
http://www.yellow-bricks.com/2009/04/15/the-basics-how-to-kill-a-vm-thats-stuck-during-shutdown/
http://www.techhead.co.uk/vmware-esx-how-to-clear-a-hung-vm
2) Switch the Service Console NIC
One of the ports in a 4 port NIC failed and it was the port being used by the Service Console, consequently the host was not available in Virtual Center although all the VM’s were still running. I had another NIC available so re-patched, but obviously couldn’t use the VI client to change the Service Console NIC.
Check your available nics: “esxcfg-nics -l” Check which is connected to service console switch (look at “Uplinks” of vSwitch0): “esxcfg-vswitch -l” unlink actual vmnic from vSwitch0: “esxcfg-vswitch -U vmnic0 vSwitch0” link another nic to vSwitch0: “esxcfg-vswitch -L vmnic1 vSwitch0”
http://www.experts-exchange.com/Software/VMWare/Q_23027712.html
3) Vmotion hangs at 10% or 94%
Moving a stack of VM’s around with VMotion, a couple got stuck at 94% and all the others stayed queued behind them, cancelling these moves in the VI client didn’t do anything, other than changing the status of the queued VMotions to ‘cancelled’. The couple which were stuck at 94% actually seemed to have completed successfully, i.e. they were on the new hosts. To clear these errors and allow further VMotions to take place I had to carry out the following on both the source and destination hosts: (before running the first command you need to check you do not have Automatic Startup / Shutdown enabled)
service mgmt-vmware restart
and
service vmware-vpxa restart
The first time I just went with
service vmware-vpxa restart
only, but then subsequent VMotion’s got stuck at 10%.
I’m not sure why this happened, a subsequent rule of thumb seemed to be don’t submit more than 4 Vmotions requests at at time, but I don’t have any technical reasons to back this up.
4) VM Configuaration files not in same datastore as vmdk disk
Before some SAN work I noticed on a report that some VM’s config files were located in a folder which was on a different datastore to the VMDK - this was going to make the work more tricky. Since I was able to power off the VM’s in question I could use the Migrate wizard and the Advanced button on the disk page to move the configuration files into the same folder on the same datastore as the vmdk file - much tidier!
5) Find all VM’s with an RDM
Need to find which of your VM’s has an RDM connected - a great PowerShell script from LucD.
http://communities.vmware.com/message/1063909;jsessionid=791355B6149788C1EBDA7F25B2A2B270
- Run Dell DSET utility to produce hardware fault report.
If you have a hardware fault with one of their servers Dell may require you to run a DSET report and send them the output. Download the Linux version and copy it to your ESX server - an easy way to transfer the files is to use a utility like the great (and free) Veeam FastSCP 3.0. Then from the Service Console navigate your way to where you copied the file and change the permissions so that you can execute it, since most likely by default you won’t be able to:
chmod +x delldset_v1.7.0.119.bin
then run the utility
./delldset_v1.7.0.119.bin
Option 2 to run the report and once complete you can copy the output file using FastSCP to somewhere you can easily get to it.
- Updating the BIOS of a Dell 2950 running ESX
OK so first thought was since it’s not running Windows I’ll just boot up with a floppy BIOS update like the old days, but since these servers have no floppy drive and no USB one to hand, I went the better way really, downloaded the Linux version and it was pretty easy.
http://www.tonywilko.net/blog/?p=3
8 Unable to format a LUN
The storage team provisioned some new LUN’s for me, but when I tried to configure them with VMFS through the Add Storage Wizard some of them were blank on the Available field and on the subsequent screen I recieved the error ‘Unable to read partition information’ and was not able to add the storage.
Luckily my colleague Alan Renouf had returned from holiday and pointed me in the direction of his blog with a post where he had experienced the same issue - it worked for me too.
If I pick up anymore tips before they get back, I’ll be sure to let you know.