Posts Tagged ESXi
VMware has a KB article detailing a bug present in ESXi 5.0 that has been known to cause a variety of networking issues in iSCSI environments. Until last week, I had not encountered this particular bug and thought I’d detail my experiences troubleshooting this issue for those still on 5.0 that may experience this issue.
The customer I was working with had originally called for assistance because their storage array was only reporting 2 out of 4 available paths “up” to each connected iSCSI host. All paths had originally been up/active until a recent power outage and since then, no manner of rebooting or disabling/re-enabling had been successful in bringing them all back up simultaneously. Their iSCSI configuration was fairly standard, with 2 iSCSI port groups connected to a single vSwitch per-server and each port group connected to separate iSCSI networks. Each port group in this configuration has a different NIC specified as an “Active Adapter” and the other is placed under the “Unused Adapters” heading.
One of the first things that I wanted to rule out was a hardware issue related to the power outage. However, after not much time troubleshooting, I quickly discovered that simply doing some NIC disable/re-enable on the iSCSI switches would cause the “downed” paths to become active again within the storage array and the path that was previously “up” would go down. As expected, a vmkping was never successful through a NIC that was not registering properly on the storage array. Everything appeared to be configured correctly within the array, the switches and the ESXi hosts so at this point I had no clear culprit and needed to rule out potential causes. Luckily these systems had not been placed into production yet and so I was granted a lot of leeway in my troubleshooting proccess.
- Test #1. For my first test I wanted to rule out the storage array. I was working with this customer remotely, so I had them unplug the array from the iSCSI switches and plug into some Linksys switch they had lying around. I then had them plug their laptop into this same switch and assign it an IP address on each of the iSCSI networks. All ping tests to each interface was successful so I was fairly confident at this point the array was not the cause of this issue.
- Test #2. For my second test I wanted to rule out the switches. I had the customer plug all array interfaces back into the original iSCSI switches. I then had them unplug a few ESXi hosts from the switches. Then they assigned their laptop the same IP addresses as the unplugged ESXi host iSCSI port groups and ran additional ping tests from the same ports the ESXi hosts were using. All ping tests on every interface was successful, so it appeared unlikely that the switches were the culprit.
At this point it appeared almost certain that the ESXi hosts were the cause of the problems here. They were the only component that appeared to be having any communication issues as all other components taken in isolation communicated just fine. At this point it was also evident that something with the NIC failover/failback wasn’t working correctly (given the behavior when we disabled/re-enabled ports) so I put the iSCSI port groups on separate vSwitches. BINGO! Within a few seconds of doing this I could vmkping on all ports and the storage array was showing all ports active again. Given that this is not a required configuration for iSCSI networking for ESXi, I immediately started googling for known bugs. Within a few minutes I ran across this excellent blog post by Josh Townsend and the KB article I linked to above. The issue caused by the bug is that it will actually send traffic down the “unused” NIC during a failover scenerio.
This is why me separating the iSCSI port groups “fixed” the issue. There was no unused NIC in the portgroup for ESXi to mistakenly send the traffic to. In addition, it also explained the behavior where disabling/re-enabling a downed port would cause it to become active again (and vice versa). In this case ESXi was sending traffic down the unused port and my disable/re-enable caused a failover scenario that caused ESXi to send traffic down the active adapter again.
In my case, upgrading to 5.0 Update 1 completely fixed this issue. I’ll update this post if I run across this problem with any other version of ESXi, just note the workaround I spoke of above and outlined in both links.
Which is better, Citrix XenDesktop or VMware View? XenServer or ESXi? HDX or PCoIP? While the answer to these questions are debated on numerous blogs, tech conferences and marketing literature, what is explored far less often is how Citrix and VMware technologies can actually work together. What follows is a brief overview of some different ways that these technologies can be combined, forming integrated virtual infrastructures.
1) Application and Desktop delivery with VMware View and XenApp
Many organizations deploying VMware View already have existing Citrix XenApp infrastructures in place. The View and XenApp infrastructures are usually managed by separate teams and not integrated to the degree they could be. Pictured above are some possible ways these two technologies can integrate. As you can see, there are many different options in terms of application delivery with both environments. The most obvious is publishing applications from XenApp to your View desktops. This can reduce the resource consumption on individual desktops and also provides the added benefit of accessing those same applications outside your View environment with the ability to publish directly to remote endpoints as well. Existing Citrix infrastructures may also be utilizing Citrix application streaming technology as well. By simply installing some Citrix clients on your View desktops, applications can be streamed directly to View desktops or alternatively directly to end-points or even to XenApp servers and then published to View desktops or endpoints. Another option is to integrate ThinApp into this environment. Tina de Benedictis, had a good write-up on this a while back. The options for this are similar to Citrix streaming. You can stream to a XenApp server and then publish the application from there, stream directly to your View desktops or stream directly to end-points. As shown in the above picture, both Citrix Streaming and ThinApp can be used within the same environment. This might be an option if you’ve already packaged many of your applications with Citrix but either want to migrate to ThinApp over time or package and stream certain applications that Citrix streaming cannot (e.g. Internet Explorer). Whatever options you choose, it’s clear that both technologies can work together to form a very robust application and desktop delivery infrastructure.
2) Load Balancing VMware infrastructures with Citrix Netscaler
Some good articles have been written about this option as well. In fact, this option is becoming popular enough that VMware even has a KB dedicated to ensuring the correct configuration of Citix Netscalers in View environments. VMware View and VMware vCloud Director have redundant components that should be load balanced for best performance and high availability. If you have either of these products and are using Citrix Netscaler to proxy HDX connections or load balance Citrix components or other portions of your infrastructure, why not use them for VMware as well? Pictured above is a high-level overview of load balancing some internal-facing View Connection servers. Users connect to a VIP defined on the Netscalers (1), that directs them to the least busy View Connection server (2) that then connects them to the appropriate desktop based on user entitlement (3). After the initial connection process, the user connects directly to their desktop over PCoIP.
This is actually an extremely popular combination and the reasons are numerous and varied. You can have 32 host clusters (only 16 in XenServer and 8 with VMware View on ESXi), Storage vMotion and Storage DRS (XenServer doesn’t have these features and you can’t use them with VMware View), memory overcommitment (only ESXi has legitimate overcommit technology), Storage I/O Control, Network I/O Control, Multi-NIC vMotioning, Auto Deploy, and many more features that you can only get from the ESXi hypervisor. Using XenApp and XenDesktop on top of ESXi gets you the most robust hypervisor and application and desktop virtualization technology combinations possible.
4) XenApp as a connection broker for VMware View
This option intrigues me from an architectural point of view, but I have yet to see it utilized in a production environment. With this option you would publish your View Client from a XenApp server. Users could then utilize HDX/ICA over external connections or the WAN and from the XenApp server would connect to the View desktop on the LAN over PCoIP. What are the flaws in this method? I can think of a couple benefits to this off-hand. First, HDX generally performs better over high latency connections, so there could be a user experience boost. Second, VMware View uses a “Security Server” to proxy external PCoIP connections. The Security Server software just resides on a Windows server OS, a hardened security appliance like Netscaler would be more secure. I’d be interested to see how things like printing and USB redirection would work in such an environment, but for me, it’s definitely something I’d like to explore more.
So, those are a few of the possibilities for integrating VMware and Citrix technologies, what are some other combinations you can think of? Any other benefits or flaws in the above mentioned methods?