Product: Veeam Management Pack for VMware (SCOM);Veeam Smart Plug-in (SPI) for VMware;Veeam Monitor;Veeam ONEVersion: AllPublished: 2011-07-14
Created: 2011-07-14
Last Modified: 2013-01-15
One of the Veeam Monitoring Products (nWorks SPI, MP, Monitor, etc.) reports that a host hardware status has changed and that their sensor states «Sensor VMware Rollup Health State equal Unknown», «Sensor VMware Rollup Health State equal Red», or «Sensor VMware Rollup Health State equal Yellow».
These alerts are good to know in case hosts in your environment have hardware issues, the issue will be notified in the alert, and the severity of the issue by VMware’s color scale (Yellow – Something is wrong but doesn’t involve data loss, Red – Data loss potential or production down, Unknown– Not knowing what the current status is of the sensor).
The problem becomes when you resolve the alert on the host, and the host reverts back to «Normal operating conditions» or «Green», but Veeam Products are continuing to report the original problem. Other issues can include the examples listed below as well.
You may experience the following problem with nworks products:
«Sensor VMware Rollup Health State equal Unknown» messages are displayed on a standalone host within a cluster, and there are no alerts on this matter.
In Veeam Monitor, you may notice that hardware status warnings are visible using vCenter, but there are no hardware related alarms visible in Veeam Monitor.
Veeam monitoring products pull the Hardware Status info of the monitored object using a MOB-connection (MOB, Managed Object Browser). At the same time, the VMware vSphere client uses a different method to obtain this type of data. Because of this difference, you may see different information in the VMware vSphere client and Veeam monitoring products.
NOTE: Current and correct hardware status of monitored objects is always available via VMware MOB. The hardware (host in this case) always takes precedence over the VC for the most accurate information.
In order to narrow down the issue, we should compare the hardware status information for monitored objects using both VMware vCenter’s MOB and Host’s MOB.
How to check the hardware sensors using VMware MOB:
1. Open the VMware vCenter server’s MOB web link using your Internet browser (https://[your_vCenter_server_address]/mob) and follow this path:
content -> rootFolder -> childEntity -> hostFolder -> childEntity -> host [select appropriate host] -> runtime -> healthSystemRuntime -> systemHealthInfo -> numericSensorInfo
2. Find HostNumericSensorInfo related to VMware Rollup Health State. Make sure that the summary string is “Sensor is operating under normal conditions” and the label string is “Green”.
As you can see from the screenshot, this host is having a problem according to the information provided in vCenter server’s MOB (VMware Rollup Health State is in Red). What we where expecting to see is the «Green» status with running as normal conditions.
3. Then open the VMware HOST’s MOB web link using your Internet browser (https://[your_VMware_host_address]/mob) and follow this path:
content -> rootFolder -> childEntity -> hostFolder -> childEntity -> host -> runtime -> healthSystemRuntime -> systemHealthInfo -> numericSensorInfo
4. Find HostNumericSensorInfo related to the VMware Rollup Health State. Make sure that the summary string is “Sensor is operating under normal conditions” and the label string is “Green”.
As you can see from the screenshot, this host is NOT having a problem according to the information provided in host’s MOB (VMware Rollup Health State is in Green).
5. Please make sure that vCenter’s and Host’s MOBs show you the same status/summary for the VMware Rollup Health State.
If you see any difference between the VMware vSphere client and/or VMware MOB statuses (as in the example above), please open a support case with VMware’s support team.
Please note that for Memory and Storage, hardware sensors will pull the data from additional sections of MOB.
Here are the paths for Memory:
content -> rootFolder -> childEntity -> hostFolder -> childEntity -> host [select appropriate host] -> runtime -> healthSystemRuntime -> hardwareStatusInfo -> memoryStatusInfoOpen the VMware HOST’s MOB web link using your Internet browser (https://[your_VMware_host_address]/mob) and follow this path:
content -> rootFolder -> childEntity -> hostFolder -> childEntity -> host -> runtime -> healthSystemRuntime -> hardwareStatusInfo -> memoryStatusInfo
Here are the paths for Storages:
content -> rootFolder -> childEntity -> hostFolder -> childEntity -> host [select appropriate host] -> runtime -> healthSystemRuntime -> hardwareStatusInfo -> storageStatusInfoOpen the VMware HOST’s MOB web link using your Internet browser (https://[your_VMware_host_address]/mob) and follow this path:
content -> rootFolder -> childEntity -> hostFolder -> childEntity -> host -> runtime -> healthSystemRuntime -> hardwareStatusInfo -> storageStatusInfo
If you see differences between vCenter’s and HOST’s MOB, it’s strongly recommended that you open a support case with VMware Support team in order to get the issue resolved.
NOTE: Also, Veeam has found that simply putting the host into maintenance mode and then exiting maintenance mode can address the problem. We still suggest that you open a support case with VMware Support team on this matter.
NOTE: For additional troubleshooting, you can do the following steps (below) to resolve this issue, but this is to be used at your own risk. If anything fails, or if this doesn’t resolve the issue, you will still need to contact VMware support.
On the VC, do the following to resolve this conflict at your own risk.
1. disable EVC on cluster.
2. vmotion machines over to secondary node.
3. maintenance mode / evict «faulted» node from cluster.
4. remove «faulted» node from vcenter.
5. log into «faulted» node via ILOM, restart management agents
6. re-add node back into vcenter.
7. re-add node to cluster.
8. re-enable EVC.
Please note that if you do not use some steps (EVC, Clustering, etc.) you can ignore these steps. The main idea is to remove all VM’s from the host, remove the host from the cluster/VC, restart the host (or the management agents), then add the host back into the VC/cluster. This process must be done one at a time per host to resolve the issue.
For additional information regarding hardware monitoring, check out the «vSphere Client Hardware Health Monitoring» whitepaper from VMware (4.1). http://www.vmware.com/files/pdf/techpaper/hwhm41_technote.pdf
