Johnny's Random Tips: VMware ESX server logs

Johnny Zhang

There are many useful log files on a ESX server to help you find what cause a problem. However, there may be to many of them how do you know which one to look at? I will base on my own experiences to explain what I think is important. I will start with HA logs. HA is configured on vCenter. However, once configured it will function without vCenter. It will build it's own HA domain (This domain always use the name 'vmware'). If you have more than 5 servers in a HA cluster, there will be 5 primaries and the rest will be secondary. The log file location after ESX 3.5 U2 is at /var/log/vmware/aam directory

The most impportant log would be "vmware_server_name.log". You can find many HA related info or errors from here.
Even with 5 primaries, there will be only one cluster manager which is the one holding all the rules. You can find which one is the cluster manager by

less vmware_server_name.log | grep -i "VMWareClusterManager submitted to run on node "| uniq

You can see how HA changed the cluster manager from host to host, and the last record is the current manager.
You can also find out if there was a isolation event happened by
less vmware_bs-bcs-h132.log | grep -i ISOLATED

When HA detect heartbeat failure on one host, it will try to find out if this is a agent failure or host failure. The isolation event will only trigger when there is a host failure. HA does this by ping the host after the heartbeat is gone
less vmware_bs-bcs-h132.log | grep -i "Ping Node results:"

When the host replys the ping it will mark as ALIVE. When the host is not responding it will mark as DEAD and HA event will trigger. (More to come in the future)

Labels: ESX 3.x Service Console Commands, Logs, VI 3 and vSphere 4

1 Response

Anonymous Says:

May 5, 2015 at 5:26 AM

thanks you verymuch, it was helpful to me.

SureshK

Post a Comment

Labels

Blog Archive

Followers