Thursday, 18 December 2014

Tips for Troubleshooting VMware vSphere

Performance Monitoring Tools:-
The vSphere performance charts allow you to display useful information when you are connected either to the ESXi host directly or to the vCenter Server. The performance charts can provide a lot of useful information, even if they do not provide all of the counters that you will find with esxtop.

The host based tool esxtop provides for some inherent advantages over the vSphere performance charts and third-party tools when it comes to performance analysis. One big advantage is that esxtop incurs very little overhead on the ESXi host. Since, esxtop is lightweight and the footprint is small, it is an excellent tool to measure performance. If you have a situation where poor performance is affecting connectivity to the host, you can use resxtop (remote esxtop). Another advantage of using esxtop is you can export the data into a comma delimited file.

If You Suspect a Network Performance Issue, Check Some of the Following Metrics
If the droppedRx (receive) is greater than 0 for a host, look at the CPU utilization. Check metrics such as CPU overhead and high CPU utilization, which can cause the VM to be too busy to take on new packets or delays in receiving the packets. A possible solution is to increase CPU reservations for the VM or check the application to see if it supports adding more vCPUs.
If the droppedTx (transmit) is greater than 0, this usually means congestion at the physical layer. When a VM is transmitting packets, the packets get queued in the buffer of the virtual switch port until the packets are transmitted on the physical nic. To prevent the dropping of transmit packets, look for ways to increase the physical network capabilities, such as adding more nics or adding 10 GB Ethernet.
Make sure you have the correct network device driver installed on the VM. By default, if VMware tools is not installed or running, the Vlance network adapter will be used. Vlance is a 10Mbps NIC, which is great for older 32-bit guest operating systems but not so useful running in a 1 GB Ethernet network.
Metrics to Check for a Possible Storage Problem
esxtop/resxtop, which comes with ESXi 5, is an excellent tool to measure performance. Some of the more significant statistics are commands queued. To check these metrics, open a vSphere Management Assistant (vMA) console and start resxtop. Type d to enter the Storage Adapter screen. Type f to select the fields that you want to view. The fields to view should be A (adapter name), F (queue stats), and K (error stats).

There are other esxtop fields that can be utilized to indicate that there could be a storage problem. To identify disk-related performance problems look at throughput and latency.

Throughput fields in esxtop (READS/s + WRITES/s = I/O operations/second (IOPS):READS/s – Number of disk reads per secondWRITES/s – Number of disk writes per second
Latency fields in esxtop:
DAVG – Average delay from the adapter to the target in ms, value greater than 10 – 15 milliseconds indicates that the storage might be slow or overutilizedKAVG – Average delay from the vmkernel to the adapter in ms, value greater than 4 milliseconds indicates the VMs are attempting to send more data to storage than the storage can handle
GAVG – Average delay for the guest, which will be DAVG + KAVG = GAVG

Log Files to View in vSphere 5
All log messages are now generated by syslog, and messages can now be logged on either local and one or more remote log servers, or both. In addition, a given log server can log messages from more than one ESXi host.

To view ESXi system logs, in the vSphere Client menu bar, select View > Administration > System Logs.

/var/log/auth.log             ESXi Shell authentication success and failure
/var/log/dhclient.log      DHCP client service
/var/log/esxupdate.log ESXi patches and updates log
/var/log/hostd.log           Host management service logs
/var/log/shell.log             ESXi Shell usage, including enable/disable and every command entered
/var/log/sysboot.log      VMkernel startup and module loading
/var/log/syslog.log          Management service initialization, watchdogs, scheduled tasks, DCUI
/var/log/usb.log               USB device arbitration events, such as discovery and pass-through to VM
/var/log/vob.log               VMkernel Observation events, similar to vob.component.event
/var/log/vmkernel.log   Core VMkernel logs (devices, storagage/network device/driver events, and VM
startup.
/var/log/vmkwarning.log             VMkernel Warning and Alert log messages.
/var/log/vmksummary.log           ESXi startup/shutdown, uptime, VMs running, and service resource consumption
Logs from vCenter Server Components on ESXi 5

/var/log/vpxa.log             vCenter vpxa agent logs
/var/log/fdm.log              High Availability logs, produced by the Fault Domain Manager (FDM) service
Last-Level Cache (LLC) Performance Issue
The ESXi CPU scheduler, by default, tries to place the vCPUs of a Symmetric Multiprocessor (SMP) VM into as much Last-Level Cache (LLC) as possible. ESXi, by default, is going to place as many vCPUs of a SMP VM into as many of the L LLCs as possible. Therefore, ESXi is going to attempt to spread out the cycle, and find space to run the workload. If you are running a very

CPU-intensive workload, you might benefit from setting up a clone of the application VM. Then turn on the LLC setting below and test the cloned application to see if there is a performance increase. If the modification works, the CPU scheduler is going to attempt to consolidate the vSMP VM into one CPU package, thus one shared LLC pool. Therefore, the CPU scheduler will now attempt to run the VM on the same package more than it would otherwise.

Using the vSphere client:

Power off the VM.
Right click the VM and select Edit Settings.
Select the Options tab.
Under Advanced, click General, and on the right click the configuration Parameters button.
Click Add Row.
Add sched.cpu.vsmpConsolidate set to true.
Power on the VM.
From the command line interface:

Power off the VM.
Add the following line into the configuration file (.vmx) of the VM.
sched.cpu.vsmpConsolidate = “true”.
Power on the VM.
Cannot Migrate a VM Using VMotion
Check the ESX(i) hosts to make sure all of the requirements have been met. Then check to make sure that the CPU is compatible. If the VM is running a 64-bit operating system, the problem might be that the source machine has Intel Virtualization Technology (VT) enabled in the BIOS, and the destination host does not have VT enabled in the BIOS. If this is the case, you will have to make a change in the BIOS so both hosts match. Also, both hosts must have a VMkernel port on the same LAN. The IP address and subnet mask should match the network configuration for VMKernel gateway.

You could run from command line:
# vmkping <Destination_IP _address> to test the VMkernel TCP/IP stack.
Any VLAN settings should match the VLAN configuration of the local LAN. VMkernel ports should have the check box VMotion enabled. There should be no router separating the hosts.
Check the VMs to make sure all of the requirements have been met.
Check that there are no local devices connected to the VM.
Check CD-ROM mappings to any ISO file on local storage, Floppy, SCSI, USB, CPU affinity, .vswp files stored on local storage.

Check that the VM has enough CPU and memory resources on the destination host

No comments:

Post a Comment