All of the Advanced Options for HA
We manage to get all the HA advanced options. With ESX 3.5 Update 2 VMware added a couple of extra advanced options, this is the complete list:
• das.failuredetectiontime - Amount of milliseconds, timeout time for isolation response action(with a default of 15000 milliseconds).
• das.isolationaddress[x] - IP address the ESX hosts uses for its heartbeat, where [x] = 1‐10. It will use the default gateway by default.
• das.usedefaultisolationaddress - Value can be true or false and needs to be set in case the default gateway, which is the default isolation address shouldn’t be used for this purpose.
• das.poweroffonisolation - Values are False or True, this is for setting the isolation response. Default a VM will be powered off.
• das.vmMemoryMinMB - Higher values will reserve more space for failovers.
• das.vmCpuMinMHz - Higher values will reserve more space for failovers.
• das.defaultfailoverhost - Value is a hostname, this host will be the primary failover host.
The new ones:
• das.failuredetectioninterval - Changes the heartbeat interval among HA hosts. By default, this occurs every second (1000 milliseconds).
• das.allowVmotionNetworks - Allows a NIC that is used for VMotion networks to be considered for VMware HA usage. This permits a host to have only one NIC configured for management and VMotion combined.
• das.allowNetwork[x] - Enables the use of port group names to control the networks used for VMware HA, where [x] = 0 - ?. You can set the value to be Service Console 2 or Management Network to use (only) the networks associated with those port group names in the networking configuration.
• das.isolationShutdownTimeout - Shutdown time out for the isolation response “Shutdown VM”, default is 300 seconds. In other words, if a VM isn’t shutdown clean when isolation response occurred it’s being powered off after 300 seconds.
VMFS Deep Dive
Agenda
VMFS Deep Dive
- ESX Storage Stack and VMFS
- VMFS Vs RDM
- SCSI reservation conflicts
- Multipathing
- Snapshot LUNs and resignaturing
The Storage Stack in VI3
VMFS – A Clustered filesystem for today’s dynamic IT world
Ø Built-In VMFS Cluster File System
Ø Simplifies VM provisioning
Ø Enables independent VMotion and HA restart of VMs in common LUN
Ø File-level locking protects virtual disks
Ø Separates VM and storage administration
Ø Use RDMs for access to SAN features
Raw Disk Mapping (RDM)
Mapping files in a VMFS volume
Ø Presented as virtual SCSI device
Ø Key contents of the metadata include location and locking of mapped device
Ø Virtual machine must interact with a real disk on the SAN
Ø Microsoft Cluster Services (MSCS)
Storage – VMFS vs. RDM
RAW VMFS
RAW may give better performance Leverage templates and quick provisioning
RAW means more LUNs - More provisioning time Fewer LUNs means you don’t have to watch Heap
Advanced features still work Scales better with Consolidated Backup
Preferred Method
Skeleton of a VMFS
A VMFS holds files and has its own metadata
Metadata gets updated through
- Creating a file
- Changing a file’s attributes
- Powering on a VM
- Powering off a VM
- Growing a file
- When metadata is updated, the VMkernel places a non-persistent SCSI reservation on the entire VMFS volume
- Lock held on volume for the duration of the operation
- Other VMkernels are prevented from doing metadata updates
VMFS 3 & SCSI Reservations
- Concurrent-access filesystem
- Most I/O happens simultaneously from all hosts
- Filesystem metadata updates are atomic and performed by the requesting host
- Locking a file for read/write (e.g. vmdk when powering on VM)
- Creating a new directory or file
- Growing a file etc.
For the time needed by the locking operation (NOT metadata update), a LUN is reserved (=locked for access) to a single host
SCSI Reservation Conflict – What it is
What happens if we try to perform I/O to a LUN that’s already reserved?
- A retry counter is decreased and the I/O operation is retried
- The retry is scheduled with a pseudo-random algorithm
- If the counter reaches 0, we have a SCSI reservation conflict
SCSI: 6630: Partition table read from device vmhba1:0:6 failed: SCSI reservation conflict (0xbad0022)
SCSI: vm 1033: 5531: Sync CR at 64
SCSI: vm 1033: 5531: Sync CR at 48
SCSI: vm 1033: 5531: Sync CR at 32
SCSI: vm 1033: 5531: Sync CR at 16
SCSI: vm 1033: 5531: Sync CR at 0
WARNING: SCSI: 5541: Failing I/O due to too many reservation conflicts
WARNING: SCSI: 5637: status SCSI reservation conflict, r
status 0xc0de01 for vmhba1:0:6. residual R 919, CR 0, ER 3
Who’s holding a SCSI Reservation?
One ESX host (persistent reservation)
- vmkfstools –L reserve : This should NEVER EVER be done
- Interaction with installed third-party management agents
Multiple ESX hosts, alternatively
- High latency/slow SAN
o Critical lock-passing between ESX hosts during vmotion
- SAN firmware slow in honoring SCSI reserve/release
o Synchronously mirrored LUNs
One non-ESX host
- LUN erroneously mapped to e.g. a Windows host
No host
- Persistent reservation held by the SAN
- Needs investigation by the SAN vendor
ESX Server Multipathing
Multipathing – vmhbaN:T:L:P notation
Determined at boot, install / rescan:
- N = adapter number
- T = target number (generally 1 SP = 1 target)
Determined by the SAN
- L = LUN ID
- SCSI identifier of the LUN (not shown here)
Determined at datastore or extent creation
- P = partition number (if 0 or absent = whole disk)
Per-LUN Multipathing Failover Policy

VMware supports using only one path at a time
- MRU = Most Recently Used
- Fixed = choose a preferred path & failback to it
- multiple ESX hosts or multiple LUNs, allows for manual load balancing between SPs
Never setup Fixed policy with an active/passive SAN! Why?
Path Thrashing
Ø Only possible on active/passive SANs
Ø Host 1 needs access to the LUN through SP1
Ø Host 2 needs access to the LUN through SP2
Ø The LUN keeps being trespassed between SPs and it’s never available for I/O
Multipathing
Active/Active
- LUNs presented on multiple Storage Processors
- Fixed path policy
Failover on NO_CONNECT
Preferred path policy
Failback to preferred path if it recovers
Active/Passive
- LUNs presented on a single Storage Processor
- MRU (Most Recently Used) path policy
Failover on NOT_READY, ILLEGAL_REQUEST or NO_CONNECT
No preferred path policy, no failback to preferred path
Load Balancing
- Fixed (Preferred Path)
1st active path discovered or user configured.
Active/Active arrays only
- Most recently used (MRU)
Active/Active arrays
Active/Passive arrays
Snapshot LUNs and Resignaturing
How VMware ESX Identifies Disks
Ø Each LUN has a SCSI identifier string provided by the SAN vendor
Ø The SCSI ID stays the same amongst different paths
Ø The vmkernel identifies disks with a combination of LUN ID, SCSI ID and part of the model string
# ls -l /vmfs/devices/disks/
total 179129968
-rwxrwxrwx 1 root root 72833679360 Nov 13 12:16 vmhba0:0:0:0
lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:0:0 -> vml.020000000060060160432017002a547c3e7893dc11524149442035
lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:1:0 -> vml.02000100006006016043201700a99d1c3bb9c5dc11524149442035
lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:10:0 -> vml.02000a000060060160432017000db2f61d17d3dc11524149442035
(...)
Snapshot LUNs & Resignaturing – Key Facts
Ø ESX identifies objects in a VMFS datastore by path e.g. /vmfs/volumes/
Ø The VMFS UUID (aka signature) is generated at VMFS creation
Ø The VMFS header includes hashed information about the disk where it’s been created
The Check for Snapshot LUNs
- VMFS relies on SCSI reservations to acquire on-disk locks, which in turn enforce atomicity of filesystem metadata updates"
- SCSI reservations don’t work across mirrored LUNs
- To avoid corruption, we need to prevent mounting a datastore and a copy of it at the same time
Ø On rescan, the information about the disk in the VMFS header metadata (m/d) is checked against the actual values
Ø If any of the fields doesn’t match, the VMFS is not mounted and ESX complains it’s a snapshot LUN
LVM: 5739: Device vmhba1:0:1:1 is a snapshot:
LVM: 5745: disk ID:
LVM: 5747: m/d disk ID:
ALERT: LVM: 4903: vmhba1:0:1:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.
LUNs Detected as Snapshots – Causes
Ø LUN ID mismatch
Ø SCSI ID change (e.g. LUN copied to a new SAN)
Ø They are effectively snapshots (e.g. DR site)
LUNs Detected as Snapshots – How to Fix
Are they mirrored/snapshot LUNs?
- If yes: will the ESX host(s) ever see both original and copy at the same time?
Yes – resignature
No – either allow snapshots or resignature
- If no: do multiple ESX hosts see the same LUN with different IDs?
Yes – fix the SAN config; if not possible allow snapshots
No – IDs permanently changed: either allows snapshots or resignature
Resignaturing Issues
Never ever resignature while the VMs are running
- resignaturing implies changing UUID and datastore name
- All paths to filesystem objects (vmdks, VMs) will become invalid!
How to Change the Polling Interval of the cmafcad Fiber Channel Agent
Information
***********
The HP Management Agents for VMware ESX Server 3.x include a Fiber Channel Agent (FCA agent) called cmafcad. If SCSI reservation conflicts on the ESX host are resulting in failed I/O or performance issues, it can be necessary to increase the polling interval of the Fiber Channel Agent. This can reduce the amount of SCSI reservation conflicts typical during peak business hours, VM startup, and VMotions.
Details
******
The following can be seen in the /var/log/vmkernel file:
WARNING: SCSI: 5446: Failing I/O due to too many reservation conflicts
WARNING: SCSI: 5541: status 0xbad0022, rstatus 0xc0de01 for vmhba1:0:0. residual R 919, CR 0, ER 3
WARNING: Fil3: 1538: Failed to reserve volume
NOTE: 0xbad0022 translates to VMK_RESERVATION_CONFLICT per vmkerrcode.
Although reservation conflicts are not always an indication of an issue, large amounts of reservation conflicts resulting in failing I/O are, and should be addressed. There are many things that can contribute to reservation conflicts in a Virtual Infrastructure environment. Be advised that the following is only one possible solution to this issue. Other possible causes should be investigated.
Increasing the polling interval of the FCA agent can reduce SCSI reservation conflicts on the host by decreasing the amount of reservations required on a given LUN.
The following steps show the procedure for increasing the polling interval:
1. Login to the ESX server from an SSH client or from iLO.
2. Using an editor such as nano or vi, open the file /opt/compaq/storage/etc/cmafcad.
3. Change the polling interval from 15 seconds to a larger span, such as 60 seconds.
Look for the variable PFLAGS. By default, it looks like this: PFLAGS="-p 15 -s OK"
Change it to the desired value: PFLAGS="-p 60 -s OK"
4. Save the file, and exit from the editor.
5. Restart the management agents on the host. The following shows how this is done with the 8.0.0 Management Agents. See the appropriate documentation or man pages for later agents.
# service hpasm stop
# service hpsmhd restart
# service hpasm start
The new settings should be the following in a ps listing:
# ps -auxwww | grep cmafcad
root 14557 0.0 0.9 14676 2452 pts/1 S 18:31 0:00 cmafcad -p 60 -s OK