Question: How do you troubleshoot V-Motion failures for Virtual Machine Scenario?
Hint: Interviewer wants to understand your skills for common issues in VMware with Virtual Machines
Answer:
How many times you see below messages at your vCenter Server? V-Motion failing at 14% – 10% – 82% – 90 to 95%
You are attempting a vMotion migration between two ESX/ESXi hosts, and the vMotion task reaches 14%, then times out with this error message:
The vMotion migrations failed because the ESX hosts were not able to connect over the vMotion network. Check the vMotion network settings and physical network configuration.
vMotion fails at 82% Cannot migrate a virtual machine with vMotion
In the /var/log/vmware/hostd.log file on source ESX host, you see the error:
ResolveCb: Failed with fault: (vmodl.fault.SystemError) {
reason = “Source detected that destination failed to resume.”
msg = “”
}
vMotion fails at 90-95% Cannot perform a vMotion vMotion times out In vCenter Server, you see the error:
Operation timed out vMotion stops at 90% then fails with the error
a general system error occurred: failed to resume on destination
VMware vMotion fails at 10% vMotion times out The VirtualCenter/vCenter Server reports these errors:
A general system error occurred: Failed waiting for data. Error 16. Invalid argument
A general system error occurred: failed to look up VMotion destination resource pool object
Enough of error messages and let us see how to answer this question in the Interview:
Thumb rule for V-Motion failure issues is like if the operation fails below 15% then you can assume it as Network/configuration issue.
You can tell him below are the settings to be verified for any V-Motion failures:
- Ensure that vMotion is enabled on all ESX/ESXi hosts.
- Determine if resetting the Migrate.Enabled setting on both the source and destination ESX or ESXi hosts addresses the vMotion failure
- Verify that VMkernel network connectivity exists using vmkping
- Verify that VMkernel networking configuration is valid
- Verify that the virtual machine is not configured to use a device that is not valid on the target host
- If Jumbo Frames are enabled (MTU of 9000) (9000 -8 bytes (ICMP header) -20 bytes (IP header) for a total of 8972), ensure thatvmkpingis run like vmkping -d -s 8972 <destinationIPaddress>. You may experience problems with the trunk between two physical switches that have been misconfigured to an MTU of 1500
- Verify that Name Resolution is valid on the host
- Verify that Console OS network connectivity exists
- Verify if the ESXi/ESX host can be reconnected or if reconnecting the ESX/ESXi host resolves the issue
- Verify that the required disk space is available
- Verify that time is synchronized across environment
- Verify that valid limits are set for the virtual machine being vMotioned
- Verify that hostd is not spiking the console
- This issue may be caused by SAN configuration. Specifically, this issue may occur if zoning is set up differently on different servers in the same cluster
- Verify and ensure that the log.rotateSize parameter in the virtual machine’s configuration file is not set to a very low value
- If the virtual machine being vMotioned is a 64-bit virtual machine, verify that the VT option is enabled on both of your ESX hosts.
- Restart the host management agents
- Verify that time is synchronized across your environment
- Verify that valid limits are set for the virtual machine being vMotioned
- Verify that host management agents are not spiking the Service Console (ESX only)
- Verify that there are no issues with the shared storage.
To check the health of the vMotion network:
-
Check for IP address conflicts on the vMotion network. Each host in the cluster should have a vMotion vmknic, assigned a unique IP address
-
Check for packet loss over the vMotion network. Try having the source host ping (vmkping) the destination host’s vMotion vmknic IP address for the duration of the vMotion.
-
Check that each vMotion vmkernel port group have the same security settings. A security mismatch causes a vMotion operation to fail. For example, a failure occurs if a source vmkernel portgroup is set to allow promiscuous mode and the destination vmkernel portgroup is set to disallow promiscuous mode .
-
Check for connectivity between the two hosts. Try having the source host ping (vmkping) the destination host’s vMotion vmknic IP addres
Hope this information helps you to crack your Interview and share with others who need this real time scenarios.
Happy Learning and All the Best for your Interview 🙂