Question: How do you troubleshoot ESXi/ESX host that is disconnected or not responding in your Infrastructure?
Hint: Interviewer wants to understand your skills for common issues in VMware with ESX/ESXi
It is common problem faced by every VMware Administrator in their Infrastructure. Most of them will stuck at Network troubleshooting as they feel disconnected happened due to connectivity problems and their approach is correct for few scenarios.
Let us discuss various reasons for the host disconnect which is expected by Interviewer 🙂
1) Verify that the ESXi host is in a powered on state – sounds silly but most of the people forget to check server status via Remote cards like ILO/DRAC/RIB … etc
2) Verify that network connectivity exists from vCenter Server to the ESXi host with the IP and FQDN – this is VMware Administrator common suspicious point
3) Verify that the ESXi host can be reconnected, or if reconnecting the ESXi host resolves the issue – Simple and some times it will resolve the problem
4) Verify that the ESXi host is able to respond back to vCenter Server at the correct IP address. If vCenter Server does not receive heartbeats from the ESXi host, itgoes into a not responding state. To verify if the correct Managed IP Address is set, see Verifying the vCenter Server Managed IP Address and ESXi 5.0 hosts are marked as Not Responding 60 seconds after being added to vCenter Server. (Known issue)
5) ESXi/ESX host disconnects from vCenter Server after adding or connecting it to the inventory (VMware KB2040630)
6) Verify that you can connect from vCenter Server to the ESXi host on TCP/UDP port 902. If the host was upgraded from version 2.x and you cannot connect on port 902, then verify that you can connect on port 905.
Use simple Telnet command for checking the ports status
7) Verify if restarting the ESXi Management Agents resolves the issue – You ran these commands many times right 😉
Run these commands:
To restart all management agents on the host, run the command:
8) ESXi hosts can disconnect from vCenter Server due to underlying storage issues – Complex one to explain but you should know the pain points from HBA card of ESXi server to LUN/Disk of the Storage Box for fixing these issues
These points are sufficient to keep the Interviewer happy and let us discuss more Real time Scenarios in next post …