VMFS Deep Dive
Agenda
VMFS Deep Dive
- ESX Storage Stack and VMFS
- VMFS Vs RDM
- SCSI reservation conflicts
- Multipathing
- Snapshot LUNs and resignaturing
The Storage Stack in VI3
VMFS – A Clustered filesystem for today’s dynamic IT world
Ø Built-In VMFS Cluster File System
Ø Simplifies VM provisioning
Ø Enables independent VMotion and HA restart of VMs in common LUN
Ø File-level locking protects virtual disks
Ø Separates VM and storage administration
Ø Use RDMs for access to SAN features
Raw Disk Mapping (RDM)
Mapping files in a VMFS volume
Ø Presented as virtual SCSI device
Ø Key contents of the metadata include location and locking of mapped device
Ø Virtual machine must interact with a real disk on the SAN
Ø Microsoft Cluster Services (MSCS)
Storage – VMFS vs. RDM
RAW VMFS
RAW may give better performance Leverage templates and quick provisioning
RAW means more LUNs - More provisioning time Fewer LUNs means you don’t have to watch Heap
Advanced features still work Scales better with Consolidated Backup
Preferred Method
Skeleton of a VMFS
A VMFS holds files and has its own metadata
Metadata gets updated through
- Creating a file
- Changing a file’s attributes
- Powering on a VM
- Powering off a VM
- Growing a file
- When metadata is updated, the VMkernel places a non-persistent SCSI reservation on the entire VMFS volume
- Lock held on volume for the duration of the operation
- Other VMkernels are prevented from doing metadata updates
VMFS 3 & SCSI Reservations
- Concurrent-access filesystem
- Most I/O happens simultaneously from all hosts
- Filesystem metadata updates are atomic and performed by the requesting host
- Locking a file for read/write (e.g. vmdk when powering on VM)
- Creating a new directory or file
- Growing a file etc.
For the time needed by the locking operation (NOT metadata update), a LUN is reserved (=locked for access) to a single host
SCSI Reservation Conflict – What it is
What happens if we try to perform I/O to a LUN that’s already reserved?
- A retry counter is decreased and the I/O operation is retried
- The retry is scheduled with a pseudo-random algorithm
- If the counter reaches 0, we have a SCSI reservation conflict
SCSI: 6630: Partition table read from device vmhba1:0:6 failed: SCSI reservation conflict (0xbad0022)
SCSI: vm 1033: 5531: Sync CR at 64
SCSI: vm 1033: 5531: Sync CR at 48
SCSI: vm 1033: 5531: Sync CR at 32
SCSI: vm 1033: 5531: Sync CR at 16
SCSI: vm 1033: 5531: Sync CR at 0
WARNING: SCSI: 5541: Failing I/O due to too many reservation conflicts
WARNING: SCSI: 5637: status SCSI reservation conflict, r
status 0xc0de01 for vmhba1:0:6. residual R 919, CR 0, ER 3
Who’s holding a SCSI Reservation?
One ESX host (persistent reservation)
- vmkfstools –L reserve : This should NEVER EVER be done
- Interaction with installed third-party management agents
Multiple ESX hosts, alternatively
- High latency/slow SAN
o Critical lock-passing between ESX hosts during vmotion
- SAN firmware slow in honoring SCSI reserve/release
o Synchronously mirrored LUNs
One non-ESX host
- LUN erroneously mapped to e.g. a Windows host
No host
- Persistent reservation held by the SAN
- Needs investigation by the SAN vendor
ESX Server Multipathing
Multipathing – vmhbaN:T:L:P notation
Determined at boot, install / rescan:
- N = adapter number
- T = target number (generally 1 SP = 1 target)
Determined by the SAN
- L = LUN ID
- SCSI identifier of the LUN (not shown here)
Determined at datastore or extent creation
- P = partition number (if 0 or absent = whole disk)
Per-LUN Multipathing Failover Policy
VMware supports using only one path at a time
- MRU = Most Recently Used
- Fixed = choose a preferred path & failback to it
- multiple ESX hosts or multiple LUNs, allows for manual load balancing between SPs
Never setup Fixed policy with an active/passive SAN! Why?
Path Thrashing
Ø Only possible on active/passive SANs
Ø Host 1 needs access to the LUN through SP1
Ø Host 2 needs access to the LUN through SP2
Ø The LUN keeps being trespassed between SPs and it’s never available for I/O
Multipathing
Active/Active
- LUNs presented on multiple Storage Processors
- Fixed path policy
Failover on NO_CONNECT
Preferred path policy
Failback to preferred path if it recovers
Active/Passive
- LUNs presented on a single Storage Processor
- MRU (Most Recently Used) path policy
Failover on NOT_READY, ILLEGAL_REQUEST or NO_CONNECT
No preferred path policy, no failback to preferred path
Load Balancing
- Fixed (Preferred Path)
1st active path discovered or user configured.
Active/Active arrays only
- Most recently used (MRU)
Active/Active arrays
Active/Passive arrays
Snapshot LUNs and Resignaturing
How VMware ESX Identifies Disks
Ø Each LUN has a SCSI identifier string provided by the SAN vendor
Ø The SCSI ID stays the same amongst different paths
Ø The vmkernel identifies disks with a combination of LUN ID, SCSI ID and part of the model string
# ls -l /vmfs/devices/disks/
total 179129968
-rwxrwxrwx 1 root root 72833679360 Nov 13 12:16 vmhba0:0:0:0
lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:0:0 -> vml.020000000060060160432017002a547c3e7893dc11524149442035
lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:1:0 -> vml.02000100006006016043201700a99d1c3bb9c5dc11524149442035
lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:10:0 -> vml.02000a000060060160432017000db2f61d17d3dc11524149442035
(...)
Snapshot LUNs & Resignaturing – Key Facts
Ø ESX identifies objects in a VMFS datastore by path e.g. /vmfs/volumes/
Ø The VMFS UUID (aka signature) is generated at VMFS creation
Ø The VMFS header includes hashed information about the disk where it’s been created
The Check for Snapshot LUNs
- VMFS relies on SCSI reservations to acquire on-disk locks, which in turn enforce atomicity of filesystem metadata updates"
- SCSI reservations don’t work across mirrored LUNs
- To avoid corruption, we need to prevent mounting a datastore and a copy of it at the same time
Ø On rescan, the information about the disk in the VMFS header metadata (m/d) is checked against the actual values
Ø If any of the fields doesn’t match, the VMFS is not mounted and ESX complains it’s a snapshot LUN
LVM: 5739: Device vmhba1:0:1:1 is a snapshot:
LVM: 5745: disk ID:
LVM: 5747: m/d disk ID:
ALERT: LVM: 4903: vmhba1:0:1:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.
LUNs Detected as Snapshots – Causes
Ø LUN ID mismatch
Ø SCSI ID change (e.g. LUN copied to a new SAN)
Ø They are effectively snapshots (e.g. DR site)
LUNs Detected as Snapshots – How to Fix
Are they mirrored/snapshot LUNs?
- If yes: will the ESX host(s) ever see both original and copy at the same time?
Yes – resignature
No – either allow snapshots or resignature
- If no: do multiple ESX hosts see the same LUN with different IDs?
Yes – fix the SAN config; if not possible allow snapshots
No – IDs permanently changed: either allows snapshots or resignature
Resignaturing Issues
Never ever resignature while the VMs are running
- resignaturing implies changing UUID and datastore name
- All paths to filesystem objects (vmdks, VMs) will become invalid!
Currently have 0 comments: