Question: Have you ever fixed metadata issues with ESXi data stores? Can you list commands or Tools used for such activity? Is there any pre-requisite checklist before you use such tools? Help me to understand the troubleshooting procedure for this scenario.
Answer: Once Interviewer got confidence at your basics of VMware, he/she will start asking questions in advanced troubleshooting or topics. You need to answer such questions wisely by adding Customer scenario along with answer. Let me help you to answer it properly with this post. You can start the answer like recently one of Customer had SAN outage which resulted couple of data stores not coming online which resulted Major Incident/Priority One ticket. I worked with VMware support team and used VOMA (vSphere On-disk Metadata Analyzer) tool to check VMFS metadata consistency.This utility scans the VMFS volume metadata and highlights any inconsistencies to which you need to work with SAN team to fix it faster in real time scenarios.
Since ESXi 5.x it is possible to check VMFS for metadata inconsistency with a tool called VOMA (VMware Ondisk Metadata Analyser). With VOMA you can check VMFS3 and VMFS5 datastores. Please note, that the tool can only identify problems, as it runs in a read-only mode. So it does not help you to fix detected errors.
Reasons to use VOMA:
- occurrence of metadata errors in the vmkernel log
- If you experience SAN outage
- After Rebuilt RAID
- Disk replacement
- Partition table update
- Reports of metadata errors in the vmkernel.log file
- Unable to access files on the VMFS volume that are not in use by any other host
- if you cannot modify, erase or access files on a VMFS datastore, that is not in use by another host
Before you start VOMA from the CLI of your ESXi host, take care of the following guidelines:
- Shut down all virtual machines running on the VMFS datastore (or migrate them)
- make sure that the VMFS volume is not in use by other hosts (best practice: unmount the datastore on the other hosts)
- make sure that the datastore is not in use by vSphere HA for heartbeating
- make sure that the datastore is not in use by other features like Storage I/O control,…
- make sure that the volume is not a multi-extent volume
Now log on to your ESXi host and let’s take a look at the available parameters of VOMA (voma -h)
Procedure:
To perform a VOMA check on a VMFS datastore and send the results to a specific log file, the command syntax is:
voma -m vmfs -d /vmfs/devices/disks/naa.00000000000000000000000000:1 -s /tmp/analysis.txt
where naa.00000000000000000000000000:1 is replaced with the LUN NAA ID and partition to be checked. Note the “:1” at the end.
This is the partition number containing the datastore and must be specified. See note below. As an advisory, if you run voma more than once, add the NAA ID and a time stamp to the output log file name. EG: -s /tmp/naa.00000000000000000000000000:1_analysis_<<hhmm>>.txt
Note: VOMA must be run against the partition and not the device. If VOMA is run against a device, it produces an error similar to:
Error: Missing LVM Magic. Disk doesn’t have a valid LVM Device
Error: Failed to Initialize LVM Metadata
Hope this information helps you to crack your Interview and share with others who need this real time scenarios.
Happy Learning and All the Best for your Interview 🙂