Saturday, 26 September 2015

ESXi killer feature is under a threat


I know it has been there for a while, but I have just learnt it.

So, apparently the Transparent Page Sharing is disabled by default now. Here is the list of the patches and ESXI builds where TPS was disabled:

  • ESXi 5.0 Patch ESXi500-201502001, released on February 26, 2015 
  • ESXi 5.1 Update 3 released on December 4, 2014 
  • ESXi 5.5, Patch ESXi550-201501001, released on January 27, 2015
  • ESXi 6.0 
This has been one of my favourite features of the ESXi. I have always taken advantage of it. Even after Nehalem CPUs were released and Large Pages made TPS useless I still preferred to disable Large Pages to have a better understanding of memory usage on my systems. Although, some VMware White Papers stated that there is about 15% to 20% CPU performance increase when using Large Pages, but I could never get the same results in my environments. 


So, why VMware made this decision?

Accordign to this KB2080735 some academic researches "have demonstrated that by forcing a flush and reload of cache memory, it is possible to measure memory timings to try and determine an AES encryption key in use on another virtual machine running on the same physical processor of the host server if Transparent Page Sharing is enabled between the two virtual machines". Sounds pretty dangerous, huh?

However, then VMware says "VMware believes information being disclosed in real world conditions is unrealistic" and "This technique works only in a highly controlled system configured in a non-standard way that VMware believes would not be recreated in a production environment."

I understand that VMware prefers "Better safe than sorry" approach and it is fair enough provided that reputation damage would be huge if that flaw would have been exploited in a real production environment.


What exactly was changed and how?

TPS is disabled only for Inter-VM memory sharing. Memory pages within one VM are still shared, though providing significantly less savings from memory deduplication.

To be more specific, the Memory Sharing feature is not actually disabled. VMware introduced so called Salting concept which will let ESXi host deduplicate two identical memory pages in different virtual machines only when their Salt value is the same.

This new concept is enforced using new configuration settings Mem.ShareForceSalting=1. Setting this option to 0 will disable requirement for Salting and will allow Inter-VM memory sharing as it used to be before applying security patches.

If you want to specify Salt value per VM here are the steps from VMware KB2091682

  1. Log in to ESXi or vCenter with the VI-Client. 
  2. Select the ESXi relevant host. 
  3. In the Configuration tab, click Advanced Settings under the software section. 
  4. In the Advanced Settings window, click Mem. 
  5. Look for Mem.ShareForceSalting and set the value to 1. 
  6. Click OK. 
  7. Power off the VM, which you want to set salt value. 
  8. Right click on VM, click on Edit settings. 
  9. Select options menu, click on General under Advanced section 
  10. Click Configuration Parameters… 
  11. Click Add Row, new row will be added. 
  12. On the left side add the text sched.mem.pshare.salt and on the right side specify the unique string. 
  13. Power on the VM to take effect of salting. 
  14. Repeat steps 7 to 13 to set the salt value for individual VMs. 
  15. Same salting values can be specified to achieve the page sharing across VMs. 

What impact may it have on your environment?

If you take advantage of TPS to overprovision your environment and your performance stats show that assigned virtual memory is larger than your physical memory be really careful and take decision on TPS before you update your hosts.

Otherwise you are risking to see all other VMware memory management features in action - Ballooning, Compression, Swapping. Definitely, these are pretty cool features, but you don't wanna see them in your Production environment.


What should I do now?

I am not an IT security guy, but as far as I understand this security risk mostly applies to multitenant environments where virtual machines belong to different companies. It can be also a risk where security requirements to the vSphere farm are significantly higher, e.g. in banking, defence industries. So you should probably check your security policies before re-enabling TPS.

However, in most of the other companies re-enabling TPS doesn't seem to be a big issue in my opinion.  Just make sure it is your educated choice.

No comments:

Post a Comment