NC2 on AWS: Change bare-metal node type while the cluster is running

Introduction

Imagine being on a long road trip . After a while the gasoline or batteries start to run low, the driver and passengers get tired and it’s time for a pit-stop. The car pulls over, the passengers get out, everyone and everything is refueled and refreshed for the next leg.

But what if instead of stopping you’d just push a button and the car would be refueled / recharged while you drive. Even better - how about swapping out the entire car to a newer model while you’re at it?

Metaphorically speaking, the Nutanix Cloud Clusters (NC2) solution on AWS can change the entire car – without the need for a pit stop. The system can just keep running while the bare metal is replaced underneath, as if nothing has happened with the exception of getting more power. All the benefits – none of the downtime. And, it can be done with a single command through the NC2 management portal. .

Animated gif representing concept of swapping i3.metal nodes to i4i.metal nodes

In this example we swap out i3.metal nodes to more powerful i4i.metal while the cluster is running. The starting point is a cluster with three i3.metal nodes and the end state is the same cluster, but now with three i4i.metal nodes. The change is seamless for the workloads running on top of NC2. Apart from a brief dip in north-south traffic during the network change, they experience no disruption.

Starting point

We start out with a plain NC2 on AWS cluster with three i3.metal nodes. In addition to the basic cluster components we have also opted to deploy the Prism Central control plane and the Flow Virtual Networking (FVN) overlay networking.

Screenshot of NC2 on AWS cluster with three i3.metal nodes

Multiple Virtual Machines (VMs) are running on the NC2 cluster. To monitor their health we start a continuous ping, the statistics of which can be evaluated after the cluster nodes have been replaced.

On the networking side we have set up No-NAT networking with Flow Virtual Networking and as such the subnet the test VM is attached to is accessible also from the native AWS Virtual Private Cloud (VPC). In this case, we are pinging the Linux NC2 test VM from a Windows EC2 instance in a separate AWS VPC as per the diagram above.

Updating the Cluster Capacity settings in the Nutanix MCM portal

The management portal for NC2 allows for easy updates to the cluster capacity and configuration. We highlight our cluster and navigate to Cluster Capacity where the node types and the number of nodes can be changed.

A few clicks later we have added three new i4i.metal (i4i) nodes (metal instances in the AWS parlance) to our original configuration of three i3.metal (i3) nodes and we have also set the number of i3 nodes to zero. This way we get three new nodes of a more powerful configuration added and after all data has been transferred over, the old cluster nodes will be removed and billing for them stopped.

Animated gif of updating Cluster Capacity setting in Nutanix NCM Portal

The task has now been accepted by the MCM portal and is being executed in the background. VMs running on NC2 continue working as usual, unaware of the big changes to the system which are under way.

Screenshot of upgrade status showing 26 seconds left

EC2 bare-metal changes as seen from the AWS console

In the AWS console it is possible to witness the process of the i4i.metal nodes being added, i3 and i4i nodes running at the same time while the cluster shifts to run on the new nodes and finally the decommissioning of the i3.metal nodes.

From a networking perspective: The i3.metal Elastic Network Interface (ENI) which was the active point of North-South communication for the cluster, and therefore part of the AWS VPC route table, has been shifted to an ENI on one of the new i4i.metal hosts post migration

Before and after screenshots of node change

Result

The node swap was completed without a hitch and without any need of input from the IT administrator managing the NC2 cluster – well, apart from initiating the change at the start. In this case, our cluster was hosting only a handful of VMs and the entire process took just under one hour to complete. Naturally, the time required for this change will increase with the amount of storage used and the load the cluster is under during the change.

As the new nodes are added, VMs and data are automatically migrated between hosts without the need for user intervention or manual “re-balancing” effort. VMs remain available and data remains protected (either Redundancy Factor (RF) 2 or RF3) at all times. All nodes in the cluster take part in the movement of data meaning that it can happen relatively quickly.

Screenshot showing nodes have been migrated to i4i.metal

Screenshot of migration status showing as complete with a duration on 56 minutes

More importantly, the workloads have experienced just a blip in network connectivity and no downtime or reboots.

The Linux VM, which we started pinging at the beginning of the blog post, is still up and the pings are still getting through. Throughout the hour-long change a total of 3381 pings were sent 26 of these were lost (near enough 0% loss according to Windows).

The uptime command on the Linux host also shows that there was no rebooting of VMs involved. Moreover, the SSH session from the Windows EC2 instance to the Linux VM on NC2 remained in place uninterrupted throughout the procedure.

Screenshot displaying uninterrupted SSH session throughout the procedure.

Conclusion

This was an example showing how quick and easy it is to migrate your NC2 cluster from one EC2 bare metal instance type to another when using Nutanix Cloud Clusters on AWS. This is in sharp contrast to some other virtualization platforms on public cloud. This functionality can also be used to scale clusters up and down with similar ease.

These capabilities coupled with the ability to hot-add resources (disk, vCPU and RAM OS compatibility allowing) to VMs in virtually any configuration you choose make NC2 one of the most flexible and scalable ways you can run your workloads in a public cloud.

For more information, please visit the Nutanix Cloud Clusters page below: https://www.nutanix.com/products/nutanix-cloud-clusters

NC2 on AWS: Change bare-metal node type while the cluster is running