Is your VMware humming, or is it leaking oil?

| | | 0 comments

You’re paying good money for those VMware licenses, and the promise is to help you maximize resource utilization while decreasing management costs and providing high availability.  However, many VMware environments are either configured sub-optimally, designed haphazardly, or over-subscribed which may be putting your company at risk.  Is yours?

Common costly issues

I am sure every VMware administrator has experienced their share of vSphere issues, but these are the most common high-level areas that I often come across that could use improvement:

Design…what design?

I have seen many VMware environments that started with the free version. Then what happens? It grows and becomes a poorly designed production VMware installation. This causes the admins to spend more time managing it and typically creates configurations with a much higher chance of downtime.

No Growth plan

Many environments look at their current resources and provision ESXi hosts for a N+1 configuration. However, they don’t plan for the massive growth usually experienced utilizing virtualization. This often means that IT managers have to renegotiate VMware licensing and have to purchase new hardware that was not budgeted.

Lack of Standards and Planning for VM deployment

It is all too common for too many folks to have access to the VMware environment with no rules in place on how to use the environment. With this typical situation, bad things happen. For example, VMs end up being placed in the wrong locations on storage resources or on the wrong storage altogether.   Of course, this can be very bad if the VM was important but the directory it was created in was not backed up.

Poor Monitoring

Unfortunately in this age of modern IT, it is still common for infrastructure to be monitored poorly or not at all. It is very important to monitor all aspects of your VMware environment to ensure proper performance and avoid typical growth issues.

Over-subscribing your cluster

I have seen many environments that have too many servers provisioned to provide N+1 high availability. To make things worse, there are typically a number of virtual machines that are not being used anymore but are still taking up resources and putting production virtual machines at risk. This issue is often overlooked since there are no immediate obvious problems, but this becomes a huge issue when there is a hardware problem with one of the hosts.

Another common mistake is to incorrectly size the virtual machines. If you combine this mistake with over-subscription, you may run into CPU ready issues. CPU Ready refers to the amount of time a virtual machine is ready to use CPU but is unable to schedule time because all CPU resources are busy. Now your beefy virtual machine is suddenly not performing well even though you have provided it plenty of resources. (In this case, too many for the available resources)

Check your VMware Vitals

Here are some ways to check the vital signs.  If any of these are in the red, you can stand to save money and help avoid disasters by fixing them!

Review your VMware environment’s health

There are multiple ways to achieve this goal. There is a freeware script available from the VMware communities here: https://communities.vmware.com/docs/DOC-9842. VMware has an official health analyzer tool, but it is only available to the partner community. With this tool any VMware partner can collect a bunch of data in the VMware authorized manner.

However, collecting the data is the easy part, once you have collected the data, you should analyze the data to put together an action item list to ensure your environment can become healthy. I have yet to see a report where there was not something to fix.

Review your processes and procedures around VMware management

Even if your VMware environment is healthy, if your processes around usage are unhealthy, things can go belly-up quickly if unruly users with too much access make easy mistakes like using all of the storage or misconfiguring networking.

Ensure that your VMware environment is either locked down so only qualified administrators can make changes or deploy a self-service solution that will allow users to make changes without breaking the environment.

Ensure your design has been properly architected

This point is especially important for all companies that have deployments that were never really planned. Have you taken the time to examine resource allocation and usage? Examine your VM sizing and placement to ensure any virtual machines are configured to take advantage of your hardware resources. Ensure your virtual machine deployment is configured to avoid CPU Ready issues.

Has your licensing plan been upgraded, but your configuration does not take advantage of clusters and distributed switches? I have seen situations where the business has upgraded licensing, but has not made the investment to actually upgrade the design and configurations to take advantage of their licensing investment.

Examine your Monitoring Solution

If you have monitoring data, review all of the details of data collected. Sometimes important details are not red flagged but should still be addressed. Also, review the monitoring configuration to ensure all necessary data is being collected.

Take Action!

VMware vitals looking a bit sour? Don’t despair! This only means you have room to improve.  After remediation, you should have a more stable, manageable, and usable environment.

  • This may seem like a simple one, but I have seen companies sit on their health reports and not implement changes. Make the changes and ensure you have a well-configured VMware environment. If you are uncomfortable making the changes, get some experienced help. It is generally cheaper than production downtime.
  • Ensure your processes for managing your vSphere infrastructure are documented and all interested parties understand the process. If you are able to lock the environment down, it would be best to restrict access to the groups and provide only the access that they require.
  • Consider automation. With vRealize Automation and vRealize Orchestration, virtualization self-service can be setup to help avoid most of the pitfalls of a wild west environment.
  • Take a serious look at your current design and business requirements. It is usually cheaper to plan ahead and provision correctly than to try and fix it after your infrastructure has failed to meet the demands. Collect data around the company’s goals for virtualization, and make a plan to ensure the VMware environment can adapt meet the business needs.
  • Ensure you have proper monitoring and/or properly configured monitoring. This is the only way that you will obtain any real insight as to what is happening in your environment, ensure you are notified when there are problems, and provide historical data to use for resource planning.

Tell us more.

Got your own tips and tricks for optimizing Vmware? Post here in the comments.  Have any questions? Give us a ring, we’d love to share more with you!