Using Powershell to Monitor VMware Guests - on a Budget...
…i.e. a budget of £0.
(Update 28/01/09 - FYI..got some feedback about this post and the reason we are not using the built in alerts in the VI client is because the CPU alerts in this case were not granular enough for us.)
So this all stemmed from trying to track down which process was causing particular servers’ CPU to hit 100% for a period. So first of all my colleague and Get-Scripting co-host Alan Renouf traded a script back and forth which ended up as the CheckHighCPU function - it is now pretty cool and comes back with a list of processes sorted by how much CPU they are using and very importantly for our circumstance who is the owner of each process.
The servers in question all belong to a particular cluster in ESX. Rather than constantly having to monitor the console, wait for a VM to turn red and then run the script to track down the process we decided to try and monitor them with a Powershell script, kick off the CheckHighCPU function when a server’s CPU hit 100% for a significant enough period and email a warning through with the process details - and so the below script was born.
OK, its not Operations Manager and to be honest its not really production quality, but it does the job we need it to.
The VI Toolkit from VMware is a great set of cmdlets you can plug into your Powershell console to manage your VMware environment. You can use the Get-Cluster and Get-VM cmdlets to return a list of all the VM’s in that cluster as objects. You can then use the very handy Get-Stat cmdlet to retrieve performance data for each VM.
In this case we check the cpu.usage.average statistic of a period of the last few minutes (watch out for the -IntervalMins parameter it can produce a period slightly different to what you would expect) and if its over 99% run the CheckHighCPU function and send the results by email.
We then make the script sleep for a short time period so that we are not constantly bombarded with alerts if a warning is triggered.
Obviously if we wanted to make it production quality we would add in some error checking and testing to see if an alert had recently been sent, but for the time being its doing a great job for the requirements that exist.
You could obviously re-use the below to monitor for different statistics offered by Get-Stat like disk or memory.
Function EmailWarning () {param ($ServerName,$Attachment) #Email warning Write-Output “Creating E-Mail Structure”
$smtpServer = “servername”
$msg = new-object Net.Mail.MailMessage $att = new-object Net.Mail.Attachment($attachment) $smtp = new-object Net.Mail.SmtpClient($smtpServer)
$msg.From = “sender” $msg.To.Add(“recipient”) $msg.Subject = “Server Warning - High CPU on $Servername” $msg.Body = “$Servername has a CPU value of $HighCPU %” $msg.Attachments.Add($att)
Write-Output “Send E-Mail” $smtp.Send($msg)
$att.Dispose();
}
Function CheckHighCPU () {param ($Target)
$procs_total = Get-WmiObject -Class Win32_PerfRawData_PerfProc_Process -Filter ’name="_total"’ -ComputerName $Target $procs = Get-WmiObject -Class Win32_PerfRawData_PerfProc_Process -Filter ’name<>"_total"’ -ComputerName $Target
[int64]$totalpercentuser = 0 foreach ($proc in $procs_total) {$totalpercentuser = $totalpercentuser + $proc.PercentUserTime}
[decimal] $perc = [System.Convert]::ToDecimal($totalpercentuser)
$myCol = @() Foreach ($proc in $procs){ $proc_perct = (($proc.PercentUserTime / $perc) * 100) if ($proc_perct -gt 1){ $Process = Get-WmiObject win32_process -ComputerName $target | where {$_.ProcessID -eq $proc.IDProcess} $MYInfo = "" | select-Object Name, CPUUsage,Owner, ProcessID $MYInfo.Name = $proc.name $MYInfo.ProcessID = $proc.IdProcess $MYInfo.CPUUsage = [Math]::Round($proc_perct, 0) $MYInfo.Owner = $process.GetOwner().user $myCol += $MYInfo } }
$myCol | Sort-Object CPUUsage -Descending | Out-File $file EmailWarning $VMname $file }
Connect-VIServer servername $vms = Get-Cluster clustername | get-vm $time = Get-Date
do {
foreach ($vm in $vms){
$VMname = $vm.name $filename = $VMname + ‘.txt’ $file = “C:\Scripts\$filename” $stats = Get-Stat -entity $vm -IntervalMins 2 -stat cpu.usage.average -MaxSamples 1 write-host $VMname $stats
if ($stats.value -ge 99){ $HighCPU = $stats.value Write-Host “Warning!” -ForegroundColor red CheckHighCPU $VMname }
else { } }
Start-Sleep -Seconds 30 }
until ($time.hour -ge 17)