Get Started with vSphere 5.1 Storage I/O Control (SIOC)

- select the contributor at the end of the page -
One of the most important benefits of Virtualization is consolidating multiple virtual machines on the same infrastructure to share a pool of resources. VMs share the memory of the host, its CPU time and its network. In addition, and to enable many of the advanced features of VMware vSphere (DRS, High Availability and FT) you will need some sort of Shared Storage between a number of Physical hosts in a cluster.

However, virtual environments can resemble real-life when sharing a limited resources. Some may act greedy and consume more than their fair share. This leads to depriving others from that resource and will cause performance degradation. A good example is a Data Mining VM or a Backup Appliance consuming most of the IOPS (Input/Outputs per Second) a shared storage can offer, chocking the more important Online Store and Email Server from this resource.

VMware Storage IO Diagram

A diagram from VMware.com showing what SIOC aims to achieve.

This article aims to explain a VMware solution to this problem, which is called Storage IO Control (SIOC) and how it enhances the application of disk shares that can be assigned to each VM. We will also enable those features using the vSphere web Client.

The problem with setting disk shares

Disk shares like memory and CPU shares are set by editing the settings of each virtual machine.

IO - 2

By expanding each virtual hard disk setting you will expose the Shares pull-down menu that provides the following options: Low (500 Share), Normal (1000 Share), High (2000 Share) or a Custom value for more granular needs.

IO - 3

It's worth noting that since each virtual disk has its own shares, a VM with two virtual disks will normally get twice the share of a VM with a single virtual disk. Also important to note, you can give the virtual disk that contains your database logs more shares than your OS or backup disk.

Disk shares, like most fairness mechanisms, do not kick in unless there is a congestion on the resource. As long as there is enough for everybody, there is nothing preventing any VM from consuming what it wants unless you specify a hard limit on the maximum number of IOPS. This limit, which can be set from the same screen, applies even when no contention on the storage is present.

That is great so far. But unlike CPU and Memory, which are shared between VMs on the same host more often than not, storage resources are shared between VMs on different hosts. To understand why this can be a problem let us consider the simple case of three VMs each with single virtual disk and normal shares.

If the three VMs are on a single host, then each VM will get its fair share of the storage IOPS. However, if one of the VMs is hosted on another host in the cluster, then it will gain an unfair advantage over the other two as it will have the host storage controller queue dedicated to it. This in turn enables it to send twice as many IOPS to the shared storage than the other two VMs.

Storage Array Queue Diagram

An example of imbalance before applying SIOC - from VMware.com

SIOC control throttles VMs IOPS by modifying the device queue depth on the host, forcing the VM to adjust its IOPS to match its configured disk shares in comparison to other VMs that share the same LUN from all hosts.

Storage Array Queue Diagram 2

The same VMs after applying SIOC - from VMware.com

This example clearly shows that even if we do not want to bother with setting custom disk shares and we simply aim to provide equal storage IOPS to our VMs, we will need a machine like SIOC to ensure fairness when it comes to shared storage.

What triggers SIOC to kick-in?

SIOC keep tracks of a performance metric called Normalized Latency, compares it with the congestion latency threshold every four seconds, and only kicks-in if the Normalized Latency is above the threshold. SIOC calculates this Normalized Latency by factoring in latency values for all hosts and all VMs accessing the datastore, taking into account different I/O sizes (larger I/O sizes means higher observed latency than smaller I/O sizes).

This Latency value is also used for SDRS when deciding to recommend a storage vMotion based on I/O metrics. This dependency explains why SDRS enables SIOC automatically on all member datastores in the cluster when enabling "I/O metric for SDRS recommendations." Actually, vSphere 5.1 enables SIOC automatically on all datastores in "stats only" mode to gather performance statistics. It does this to be ready in case a datasite is added to a datastore cluster since SDRS uses the last 24 hours' average to make recommendations.

IO - 6

Although SIOC is enabled using a vCenter server, latency values are actually stored on the datasite itself. This allows SIOC to function, even when vCenter is down or becomes unavailable after SIOC is enabled.

Storage IO Control requirements and limitations:

  • SIOC is an Enterprise Plus feature only and is not available for any lower edition of vSphere.
  • To enable SIOC on a datastore it must be managed by a single vCenter Server system.
  • vCenter Server and all hosts connected to the datastore must be running vSphere 4.1 or greater.
  • SIOC supports both block-based storage (like iSCSI and Fiber Channel) and NFS storage, but does not support Raw Device Mappings (RDMs) or multiple extents LUNs.

Enabling Storage IO Control

To enable SIOC, browse to the datastore in the vSphere Web Client.

Click on the Manage tab.

Click on settings, then General and you should see the status of SIOC under "Datastore Capabilities."

IO - 7

Clicking on edit gives you a one-step wizard to enable SIOC.

IO - 8

Notice that the latency threshold can be set manually (from 5ms to 100ms), and this was the only option before vSphere 5.1. The problem with setting it manually was that if you set it too low, SIOC will start to throttle IOPS too early and decrease the throughput of the storage. If you set it too high your VMs will suffer until SIOC starts to take corrective action.

The question is, how can you know what is the best latency to set for your particular SAN when not all storages devices are created equal?

For an SSD based storage a 15ms latency may be considered high, while for a SATA based backend you can experience latency values up to 50ms before it starts to reach its optimal throughput.

In the past VMware suggested a default threshold of 30ms as "a value that should work well for most users" and gave the following table as a recommended guideline to change this value if needed. Still, the characteristics of your workload or your own preference can play a factor in determining the best value for your needs.

IO - 9

VMware SIOC recommendations for congestion thresholds of different SAN types

VMware | Latency and ThroughputA major enhancement in vSphere 5.1 is replacing the manual setting of the congestions threshold with a percentage of peak throughput that is automatically determined for each device by injecting IOs and observing both latency and throughput. When the throughput of the LUN starts to suffer this means that it reached its maximum and the latency corresponding to 90% of the maximum throughput is used as a threshold.

Of course, you can still customize this by changing the percentage of maximum throughput you try to achieve but at least now SIOC will find that maximum for you automatically as shown in the adjacent graph.

Another change relates to metrics, as SIOC now observes latency between the VM and the datastore. Before, it was ignoring any latencies inside the ESXi host itself and only considered latency observed between the host and the datastore. The new VmObservedLatency is more representative of what the VM will see and can give better perspective to virtualization admins.

Conclusion

Storage I/O Control (SIOC) is an Enterprise Plus feature that is used to control the I/O usage of a virtual machine and to gradually enforce the predefined I/O share levels according to the business needs. Even if equal shares are desired, fairness cannot be guaranteed between VMS on different hosts without SIOC.

It is supported on Fibre Channel, iSCSI and NFS storages, and can automatically set the best latency threshold to achieve maximum throughput. This enables you to make the best out of your shared storage, and helps you easily manage a better virtual environment.

This feature is a must have for organizations that want to achieve higher consolidation ratios or host VMs for multiple tenants (like public clouds) as it helps reduce the effect of a noisy neighbor trying to hog storage resources on the rest of the well-behaving VMs.

Certify your VMware virtualization knowledge by watching vSphere 5 Training taught by two vExperts.

Get our content first. In your inbox.

Loading form...

If this message remains, it may be due to cookies being disabled or to an ad blocker.

Contributor

Ashraf Al-Dabbas

Ashraf Al-Dabbas is a vExpert, VCP, 3xMCSE, MCITP, CCNP, ITIL v3 Certified and an MBA holder. He has 10+ years of diverse experience working in a large organizations in systems infrastructure support, leading corporate wide IT initiatives, organizing and conduction projects and social activities.

For Ashraf, IT is a passion not a profession. He is self-motivated, persistent and full of positive attitude. Exploring new technologies, learning new knowledge, visiting new places and meeting new people are the things that drive him forward. He likes to write, share ideas and interact with different people. As part of his upbringing in the Jubilee School for gifted students (Amman, Jordan), Ashraf learned to understand, accept then debate all points of view objectively and respectfully.