Latest News

VMFS Deep Dive

Tuesday, April 28, 2009 , Posted by Virtualbox at 11:46 PM

Slide 1


VMFS Deep Dive

- ESX Storage Stack and VMFS


- SCSI reservation conflicts

- Multipathing

- Snapshot LUNs and resignaturing

The Storage Stack in VI3

VMFS – A Clustered filesystem for today’s dynamic IT world

Ø Built-In VMFS Cluster File System

Ø Simplifies VM provisioning

Ø Enables independent VMotion and HA restart of VMs in common LUN

Ø File-level locking protects virtual disks

Ø Separates VM and storage administration

Ø Use RDMs for access to SAN features

Raw Disk Mapping (RDM)

Mapping files in a VMFS volume

Ø Presented as virtual SCSI device

Ø Key contents of the metadata include location and locking of mapped device

Ø Virtual machine must interact with a real disk on the SAN

Ø Microsoft Cluster Services (MSCS)

Storage – VMFS vs. RDM


RAW may give better performance Leverage templates and quick provisioning

RAW means more LUNs - More provisioning time Fewer LUNs means you don’t have to watch Heap

Advanced features still work Scales better with Consolidated Backup

Preferred Method

Skeleton of a VMFS

A VMFS holds files and has its own metadata

Metadata gets updated through

- Creating a file

- Changing a file’s attributes

- Powering on a VM

- Powering off a VM

- Growing a file

  • When metadata is updated, the VMkernel places a non-persistent SCSI reservation on the entire VMFS volume
  • Lock held on volume for the duration of the operation
  • Other VMkernels are prevented from doing metadata updates

VMFS 3 & SCSI Reservations

  • Concurrent-access filesystem
  • Most I/O happens simultaneously from all hosts
  • Filesystem metadata updates are atomic and performed by the requesting host

Locking a file for read/write (e.g. vmdk when powering on VM)
- Creating a new directory or file
- Growing a file etc.

For the time needed by the locking operation (NOT metadata update), a LUN is reserved (=locked for access) to a single host

SCSI Reservation Conflict – What it is

What happens if we try to perform I/O to a LUN that’s already reserved?

- A retry counter is decreased and the I/O operation is retried

- The retry is scheduled with a pseudo-random algorithm

- If the counter reaches 0, we have a SCSI reservation conflict

SCSI: 6630: Partition table read from device vmhba1:0:6 failed: SCSI reservation conflict (0xbad0022)

SCSI: vm 1033: 5531: Sync CR at 64

SCSI: vm 1033: 5531: Sync CR at 48

SCSI: vm 1033: 5531: Sync CR at 32

SCSI: vm 1033: 5531: Sync CR at 16

SCSI: vm 1033: 5531: Sync CR at 0

WARNING: SCSI: 5541: Failing I/O due to too many reservation conflicts

WARNING: SCSI: 5637: status SCSI reservation conflict, r

status 0xc0de01 for vmhba1:0:6. residual R 919, CR 0, ER 3

Who’s holding a SCSI Reservation?

One ESX host (persistent reservation)

- vmkfstools –L reserve : This should NEVER EVER be done

- Interaction with installed third-party management agents

Multiple ESX hosts, alternatively

- High latency/slow SAN

o Critical lock-passing between ESX hosts during vmotion

- SAN firmware slow in honoring SCSI reserve/release

o Synchronously mirrored LUNs

One non-ESX host

- LUN erroneously mapped to e.g. a Windows host

No host

- Persistent reservation held by the SAN

- Needs investigation by the SAN vendor

ESX Server Multipathing

Multipathing – vmhbaN:T:L:P notation

Determined at boot, install / rescan:

- N = adapter number

- T = target number (generally 1 SP = 1 target)

Determined by the SAN

- L = LUN ID

- SCSI identifier of the LUN (not shown here)

Determined at datastore or extent creation

- P = partition number (if 0 or absent = whole disk)

Per-LUN Multipathing Failover Policy

VMware supports using only one path at a time

- MRU = Most Recently Used

- Fixed = choose a preferred path & failback to it

- multiple ESX hosts or multiple LUNs, allows for manual load balancing between SPs

Never setup Fixed policy with an active/passive SAN! Why?

Path Thrashing

Ø Only possible on active/passive SANs

Ø Host 1 needs access to the LUN through SP1

Ø Host 2 needs access to the LUN through SP2

Ø The LUN keeps being trespassed between SPs and it’s never available for I/O



- LUNs presented on multiple Storage Processors

- Fixed path policy

Failover on NO_CONNECT

Preferred path policy

Failback to preferred path if it recovers


- LUNs presented on a single Storage Processor

- MRU (Most Recently Used) path policy


No preferred path policy, no failback to preferred path

Load Balancing

- Fixed (Preferred Path)

1st active path discovered or user configured.

Active/Active arrays only

- Most recently used (MRU)

Active/Active arrays

Active/Passive arrays

Snapshot LUNs and Resignaturing

How VMware ESX Identifies Disks

Ø Each LUN has a SCSI identifier string provided by the SAN vendor

Ø The SCSI ID stays the same amongst different paths

Ø The vmkernel identifies disks with a combination of LUN ID, SCSI ID and part of the model string

# ls -l /vmfs/devices/disks/

total 179129968

-rwxrwxrwx 1 root root 72833679360 Nov 13 12:16 vmhba0:0:0:0

lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:0:0 -> vml.020000000060060160432017002a547c3e7893dc11524149442035

lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:1:0 -> vml.02000100006006016043201700a99d1c3bb9c5dc11524149442035

lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:10:0 -> vml.02000a000060060160432017000db2f61d17d3dc11524149442035


Snapshot LUNs & Resignaturing – Key Facts

Ø ESX identifies objects in a VMFS datastore by path e.g. /vmfs/volumes//

Ø The VMFS UUID (aka signature) is generated at VMFS creation

Ø The VMFS header includes hashed information about the disk where it’s been created

The Check for Snapshot LUNs

- VMFS relies on SCSI reservations to acquire on-disk locks, which in turn enforce atomicity of filesystem metadata updates"

- SCSI reservations don’t work across mirrored LUNs

- To avoid corruption, we need to prevent mounting a datastore and a copy of it at the same time

Ø On rescan, the information about the disk in the VMFS header metadata (m/d) is checked against the actual values

Ø If any of the fields doesn’t match, the VMFS is not mounted and ESX complains it’s a snapshot LUN

LVM: 5739: Device vmhba1:0:1:1 is a snapshot:

LVM: 5745: disk ID:

LVM: 5747: m/d disk ID:

ALERT: LVM: 4903: vmhba1:0:1:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

LUNs Detected as Snapshots – Causes

Ø LUN ID mismatch

Ø SCSI ID change (e.g. LUN copied to a new SAN)

Ø They are effectively snapshots (e.g. DR site)

LUNs Detected as Snapshots – How to Fix

Are they mirrored/snapshot LUNs?

- If yes: will the ESX host(s) ever see both original and copy at the same time?

Yes – resignature

No – either allow snapshots or resignature

- If no: do multiple ESX hosts see the same LUN with different IDs?

Yes – fix the SAN config; if not possible allow snapshots

No – IDs permanently changed: either allows snapshots or resignature

Resignaturing Issues

Never ever resignature while the VMs are running

- resignaturing implies changing UUID and datastore name

- All paths to filesystem objects (vmdks, VMs) will become invalid!

Currently have 0 comments:

Leave a Reply

Post a Comment