ESXi 4.0 Slow Boot Times When Hosting Passive MSCS Nodes With RDM LUNs
During the initial stages of an upgrade of a number of VMware hosts from ESX 3.5 U5 to ESXi 4.0 U2 the boot times rose from the normal few mins (most of which is Dell Hardware checks) to around 12 mins.
In particular it was appearing to hang for 5 mins, whilst on the screen the below was displayed:
Loading module multiextent
This would only happen after the install was completed and the host connected back to the fibre channel SAN, otherwise boot times were normal. It was also fine on ESX 3.5 U5 when connected to the SAN.
Some research led me to the below blog post which describes that this can occur when the hosts are part of a cluster which contain Passive MSCS Nodes with RDM LUNs.
http://www.vstable.com/tag/slow/
I made the recommendation to modify the Scsi.UWConflictRetries Advanced Setting to the minimum value of 80 and the boot time dropped to around 5 mins, slighty longer than before, but much better.
Of course you could also make this change in PowerCLI using the below:
Get-VMHost test01 | Set-VMHostAdvancedConfiguration -Name Scsi.UWConflictRetries -Value 80
Watch out because the name of the Advanced Setting appears to be case sensitive.
Update: 09/02/11
In ESXi 4.1 the value Scsi.UWConflictRetries has been removed and has been replaced with Scsi.CRTimeoutDuringBoot , see updated KB article http://kb.vmware.com/kb/1016106 . The article recommends setting Scsi.CRTimeoutDuringBoot to 1. You can do this with PowerCLI like this:
Get-VMHost test01 | Set-VMHostAdvancedConfiguration -Name Scsi.CRTimeoutDuringBoot -Value 1