Question: Have you installed vCenter 6.5 in HA mode? (new feature) How do you troubleshoot if there is any issue with vCenter availability? What happens if the failover process is not executed in HA? Can you access the vCenter server with regular IP address?
Answer: It’s long time that I posted scenario based question. You may not get this much detailed question from the Interviewer but I tried to cover some aspects of question can be asked. Having said that, VMware announced this vCenter HA feature from vSphere 6.5 with there vCenter server standard license. So no additional cost involved for this feature 🙂 & thanks for not making licensing model more complex topic. This feature is exclusively available for the vCenter Server Appliance (VCSA). When vCenter HA is enabled, a three-node vCenter Server cluster (Active, Passive, and Witness nodes) is deployed. vCenter HA provides an RTO of about 5 minutes for vCenter Server greatly reducing the impact of host, hardware, and application failures with automatic failover between the Active and Passive nodes.
vCenter HA can also be enabled, disabled, or destroyed at any time allowing customers to easily take advantage of this new capability. There is also a maintenance mode that prevents planned maintenance from causing an unwanted failover. It’s lot of theory and here is the famous vCenter HA diagram. Do not get confused with cluster HA feature which is there from long time. It’s always tricky with HA feature as it creates additional machines. Let’s focus about our explanation towards vCenter HA crashed.
You can begin the story with the production scenario – our vCenter server is not responding even it’s configured in HA mode. Upon investigation we notice that vCenter is not in network and can’t manage it remotely. When the connection made to ESXi server directly to see what’s running in vCenter console it’s up and running but not reachable in the network. When we ran ifconfig command at the console – there is only private IP but management IP is not visible. It sounds like vCenter is still able to make heartbeat connections with it’s witness node and there is no vCenter failover happened. So you have production outage as vCenter is not available and any software depends on vCenter functionality is going to be affected like backups,disaster recovery … etc. It’s good explanation about the problem and let’s cover the recovery steps required for this problem.
If all nodes in a vCenter HA cluster cannot communicate with each other, the Active node stops serving client requests.
Node isolation is a network connectivity problem.
- Attempt to resolve the connectivity problem. If you can restore connectivity, isolated nodes rejoin the cluster automatically and the Active node starts serving client requests.
- If you cannot resolve the connectivity problem, you have to log in to Active node’s console directly.
- Power off and delete the Passive node and the Witness node virtual machines.
- Log in to the Active node by using SSH or through the Virtual Machine Console.
- To enable the Bash shell, enter shell at the appliancesh prompt.
- Run the following command to remove the vCenter HA configuration.
- Reboot the Active node. The Active node is now a standalone vCenter Server Appliance.
- Perform vCenter HA cluster configuration again.
These steps helps you to bring your vCenter back online into the management network and tip here is you need to use VMRC to correct network card status to connected 😉 this is not listed in VMware official document right. I found that vCenter VM has two network adapters but only heartbeat network card seems connected but management is not. I tried to run some linux based command to bring it online but trick is it’s in edit settings. I need to look for same setting via HTML client for ESXi but for now you can use VMRC console as your answer. That’s it for this post and will publish more troubleshooting scenarios to share my knowledge.
Be social and share it with social media, if you feel worth sharing it”