Tuesday, 5 April 2016

StarWind Virtual SAN review - Part 3 - Disk and Cache types


StarWind delivers Virtual SAN with two main device types: Flat and LSFS.

Flat device is basically a thick-provisioned image file placed on one of the disks in Virtual SAN server, which can be presented as iSCSI target. A Flat device does its job with minimal overhead and proved to be a reliable no-frills choice. You can run it in a standalone or HA configuration.
In previous versions of StarWind Virtual SAN it used to be the only device option until StarWind introduced LSFS.

LSFS stands for Log Structured File System. According to Wiki "LSFS is a file system in which data and metadata are written sequentially to a circular buffer, called a log". It is quite efficient solution when you run virtualized workload on top of it, but in some scenarios it can behave not as good as you would expect it. More on it below.

* For those of you who has never heard about LSFS (WAFL ,CASL) here is a very short description of how it works.

All new writes in LSFS go to new space, or new log file. For instance, If VM tries to overwrite, let's say, Block A the LSFS device will not alter existing Block A, but will write the data to the new log file and will update the pointer to the Block A in the metadata table. The old block A becomes invalid and will be cleaned up during garbage collection process.

Along with invalid Block A the LSFS log file may contain Block B which is still actual as it was left intact. When garbage collection runs it will move valid blocks of data to new space and will delete the invalid blocks. So there is expected performance impact during the garbage collection and you must be aware of it.

So let’s start with the perks you get with LSFS:

1. LSFS is awesome for workload with multiple random small writes, and therefore, it is ideal for virtual workloads which deliver mostly blended stream of random IOs. To improve Writing performance it caches all Writes in RAM, re-orders and merges them into one big chunk of data and pushes data to hard drives as sequential Write IOs. Even regular 7.2K HDD can thousands of sequential Writes.

The following pic conveys the whole idea of LSFS. 





StarWind claims that LSFS on RAID5 on 3 x HDDs performs faster than local datastore on RAID10 on 6 x HDDs. However, if you have mostly sequential IOs it may be preferable to use Flat images, although in a virtualised environment you get Random IOs most of the time. Here is a nice White Paper with more details on this feature

2. LSFS eliminates the RAID5/6 write penalty since LSFS doesn't need to go through Read-Modify-Write cycle as it writes full stripes on your RAID. This applies, for instance, to scenarios with separated Compute & Storage layers where StarWind is installed natively on Windows or when Virtual This can also apply to Hyperconverged scenarios with Vmware as StarWind can either align its writes or take ownership of a raw device and set its own block parameters. Btw, by default the local storage cannot be presented as RDM device - see VMware KB1017530


3. Thin Provisioning. 

Not much to explain here, but I have to mention that it is default feature for LSFS and it cannot be disabled.

4. Inline Deduplication.

Hash-tables of all data blocks are stored in RAM. Before writing new data block to LSFS disk Virtual SAN server computes its hash and compares it with hash table. If match is found only metadata is updated creating a pointer to matching data block on the disk . Since all operations are done in RAM it doesn't incur much performance overhead and in general decreases number of actual Writes to disks. Interestingly, once deduplication is enabled it also deduplicates L1 cache, thus, letting you store significantly more data in RAM to serve IOs faster.

5. It is flash-friendly as it reduces the number of re-writes and extends the lifespan of all flash array.


There are some important things to keep in mind when you plan to use LSFS:

  • Under heavy random write small block (4-16K) workload LSFS can use up to 2.5-3 times of space compared to the LSFS device size. So if you provide, for instance, 1TB LSFS disk you may need to place LSFS device on 2.5TB disk in Virtual SAN server. This is natural feature of all Log structured files systems as it can take some time before old data is cleaned up during the garbage collection process. If the disk hosting LSFS device runs out of free space the LSFS device will convert into RO mode.
  • Be aware that LSFS device cannot be extended and its maximum size is limited by 11TB. However, it is not recommended to create that big SW device anyway due to operational overhead - it takes longer to re-sync such devices in HA configuration and backup/restore time is increased as well. This limit will be increased to 64TB in the next 3-4 months according to StarWind Virtual SAN roadmap
  • For each 1TB of LSFS disk space you need to provide 3.5GB of RAM to accommodate random Writes before they are re-ordered in sequential Writes.
  • When Virtual SAN is installed natively on physical Windows server only Hardware RAID controllers are supported for now. So please get a descent RAID controller instead of running Windows Software RAID. According to StarWind roadmap they are bringing raw device mapping support in the future builds.
  • Performance results of sequential Reads on LSFS can be inconsistent as in LSFS format these sequential Reads turns into Random reads. That means you need to have a clear understanding what your workload consists of before you make a choice on device type
  • It can take a while to mount an LSFS device after Virtual SAN server reboot


Another substantial benefit of SW Virtual SAN compared with other storage solutions is that it gives you total control of cache type and size per each storage device. 

StarWind Virtual SAN allows you to create two types of cache:

Level 1 - this cache is created in RAM and is light-speed fast. It is recommended to use it in Write-Back mode to provide consistent performance during Write bursts. The more RAM you have the bigger cache you can create. It is all up to you budget. You can always start small and add more cache as needed later on.

Level 2 - this cache is normally stored on commodity SSD devices, though you can always go with NVMe SSDs. StarWind recommends to use in in Write-Through mode. Thus, it will speed up Read IOs only, but your SSD will serve you longer as it wears out slower in Write-Through mode. It will also offload random read IO from the underlying storage allowing it to deliver more IO for random writes.

Each disk type and each feature requires some RAM to be reserved so please check StarWind best practices and do a proper math to check the minimum RAM requirements for your Virtual SAN configuration.

* I also had a couple of chats with StarWind engineers and according to them L2 cache doesn't show its best performance yet in the latest publically available build. The new build promises to bring optimised L2 caching algorithm. Although in my performance tests L2 cache was pretty useful, even though the results of L2 cache performance boost were not consistent across all tests.




2 comments:

  1. Thanks for this awesome post!

    I have a question.

    Why does LSFS turn sequential Reads into Random reads? Does this make any sense?

    I thought the main reason for LSFS is to turn random IOs into sequential IOs to improve performance?

    Best Regards
    Patrick

    ReplyDelete
    Replies
    1. Hi Patrick,

      That's the nature of LSFS as it doesn't overwrite data, but instead writes data to new log and invalidates old blocks.

      Imagine creating a file that consists of blocks A,B,C. All of them are written sequentially into the log-1. When block B is changed LSFS writes new block B to the new log-2 file and updates the pointer for block B to log-2 file. Later on, block C is updated and it ends in log-3, and again the pointer is updated with new location of block C and the old block C in log-1 is invalidated and will be cleaned up during the garbage collection process.

      Also, garbage collection process will move block A to the new log file and will delete block B and C from log-1 file. So, you sequentially written file turns to randomly spread chunks of data across the LSFS logs.

      Therefore, when you retrieve the file you get a number of random Reads as you read the file from different LSFS log files.

      LSFS eliminates the Write penalty and that's one of the heave hitters on the storage. But you have mostly Read pattern on the storage you may want to consider using Flat storage instead.

      Delete