Hypervisors by default collect around 24 hours of ‘real-time’ data. If you want to collect more than that, you need to adjust your data retention settings. You may also need to alter the types of data captured to gather IOPS and throughput information which is essential when sizing for the cloud.
Let’s use server ‘DYNCRM’ as the example. We can see that this system has been assigned (granted) 6Gb of memory and almost all of it is being “consumed’:
If we take a closer look you’ll notice the line near the bottom titled ‘Active’ which is around 0.8Gb (~13% of the total available). It breaks it down into 3 categories – ‘Granted’, ‘Consumed’, and ‘Active’. The question is – which number do you use when sizing for the cloud?
Before answering, let’s look at what Movere says:
According to Movere – on average, this device is consuming 53.98% of its allotted memory. This goes up to 55.98% when we factor in 3 Standard Deviations, indicating this is a very static workload.
How can the hypervisor say 0.8Gb, 1Gb and 6Gb, but Movere be telling us the device is using 3.2Gb?
Shouldn’t the hypervisor know exactly what the guest is using?
The hypervisor knows when its physical memory is being used by a guest because when they initially access a page it causes a fault. The hypervisor sees the fault and captures this as ‘active’ memory usage. Sounds good so far, but here’s the challenge – the hypervisor has no way of knowing when the physical host memory has been freed. This is because the hypervisor doesn’t have access to the operating system’s working set. As a result, the hypervisor cannot monitor changes as they occur.
So how does the host release memory and why is the data Movere collects better?
The host allocates memory based upon page requests from its guests [this is the difference between consumed (the 6Gb) and active (0.8Gb) memory]. This ‘active’ usage is not deallocated once it is freed by the guest because the host can only see the fault caused by initial access. Now you might be saying ‘well that’s OK because we know what was accessed’ – the problem is, if the guest re-uses previously allocated pages the host won’t allocate any additional physical host memory to those pages. If the guest allocates ‘new’ pages, then the active number will increase. The hypervisor will then continue doing this until all configured memory pages assigned to the guest have been consumed (the 6Gb). That doesn’t sound so bad, but this brings us to what the ‘active’ memory number really means.
Hypervisors calculate ‘active’ memory using a sampling approach. At the beginning of each sampling period, the hypervisor intentionally invalidates several random physical pages and starts to monitor access to them. At the end of the sampling period, a fraction for actively used memory is estimated based on the fraction of invalidated pages re-accessed during the sampling period. In other words, the active memory is calculated from the outside using an approximation at the hypervisor level. But, this won’t be accurate if the guest re-uses previously allocated pages that weren’t randomly invalidated as part of the sampling process. So even though memory usage is occurring, the host won’t be aware of ALL usage. This explains why the hypervisor says 0.8Gb and Movere says 3.2Gb.
Collecting data from the working set is crucial to accurately predicting your memory usage. This is where memory is allocated and deallocated in real time. We don’t need to rely on sampling, waiting, or random invalidation because the VM tells us exactly what is being used.