Introduction

The Nutanix Prism Central management plane console provides one-click simplicity for managing your global hybrid cloud infrastructure from a unified console. With its many powerful features and ease of use, Prism Central quickly becomes a critical part of infrastructure operations. As such, having a resilience strategy for the Prism Central environment is paramount and an integral part of an organization's business continuity plan.

A major challenge that organizations might face is when Prism Central is affected by a hosting cluster failure, a site failure, or data loss due to a ransomware attack. In such a situation, there needs to be a seamless way to back up and restore Prism. In addition to the importance of back up and restore, high availability of Prism Central is also paramount. This post elaborates on the resilience options available to you with the release of PC2024.1.

Resilience Use-cases

Disasters come in multiple forms: natural calamities, power failures, network failures, security breaches, or attacks. When disasters strike, there are several types of Prism Central failures that can occur:

Types of Prism Central failure Types of Prism Central failure
  1. Prism Central failure: a Prism Central VM crashes or becomes unavailable.
  2. Cluster failure: the Prism Element (PE) cluster on which Prism Central is hosted becomes unavailable, taking Prism Central down along with it.
  3. Datacenter/Availability Zone/Site failure: the entire datacenter or site becomes unavailable.

Solutions - Prism Central High Availability

Prism Central supports two high availability solutions:

  1. Scale-Out Prism Central: Increase the number of Prism Central VMs to 3 in order to tolerate failure of up to one Prism Central VM. Scaling out Prism Central VM also increases the maximum scale supported.
  2. Prism Central VM hosting node High Availability: When the node hosting a Prism Central VM fails, the Prism Central VM automatically respawns on another node in the same cluster that has sufficient resources. Any configured anti-affinity policies are honored.

Documentation on High Availability in Prism Central can be found here:

https://portal.nutanix.com/page/documents/details?targetId=Prism-Central-Guide:mul-pc-high-availability-c.html.

Solutions - Prism Central Backup and Restore

Prism Central supports two Backup and Restore solutions:

  1. Continuous Backup: A continuous backup of Prism Central VM to up to three registered clusters as a backup target with an RPO of thirty minutes and RTO of two hours. This is ideal for cluster level failure or Prism Central corruption scenarios. 
  2. Point-in-Time Backup: Create multiple off-site point-in-time backups of Prism Central with an RPO of two hours and an RTO of two hours. Prism Central can be restored from backups as old as one month. Today, backups can be exported offsite to AWS S3 buckets, which is ideal for ransomware attacks, data loss, and for site or availability zone failures. We plan to support additional S3 compatible endpoints as backup targets in upcoming releases.

How Does it Work? 

Both backup and restore options work in two stages :

  1. Backup: For the backup process, the first step is to trigger configuration replication or backup from Prism Central. The backup is stored on a registered Prism Element cluster for continuous backup or in an AWS S3 bucket for point-in-time backups. The initial full backup takes up to 1 hour to complete. Once the baseline backup is created, subsequent configuration changes are backed up asynchronously. For continuous backups, the overall RPO is 30 minutes, with some configurations having an RPO as low as 200ms. For point-in-time backups, the RPO follows the configured schedule. This incremental backup approach ensures efficient protection of the Prism Central configuration with minimal overhead.
  2. Restore: In the event of a disaster requiring Prism Central restoration, the recovery process can be initiated from any available Prism Element cluster registered with Prism Central. A new Prism Central instance is automatically deployed by downloading the latest installation from the Nutanix download portal. The backed-up configuration is then imported, either from a Prism Element cluster hosting the continuous backup or from the AWS S3 bucket containing the point-in-time backups. The entire restoration process takes approximately 2 hours to complete, ensuring a swift recovery of Prism Central's operational state with minimal downtime.

The following product’s settings and configurations in Prism Central are backed up and restored:

Intelligent Operations Flow Virtual Networking Flow Network Security Nutanix Disaster Recovery

In addition, the following configurations within Prism Central are also backed up and restored:

Policies - Access control, Alert notification, NGT, Storage Categories and Entities Virtual Networks, VPN configs, Subnets, Availability Zones IAMv2
Rules - Network Security groups, Protection, Recovery - Plans, Audits 90 days metrics Licenses, Syslog configs, User groups

Self Service, Catalog, Images, VM Templates and metrics > 90 days old are unsupported. Support for the Nutanix Files Storage, Objects Storage, and LCM products is in process and will be made available in subsequent Prism Central releases.

Scenarios and Solution Recommendations

Organizations can simultaneously enable high availability and backup/restore capabilities for Prism Central, leveraging these complementary resilience options to address various failure scenarios.

Scenario Recommended Solution
Site/Availability Zone Failure Use Prism Central backup to AWS S3
Prism Central VM Failure Enable Prism Central High Availability
Node (hosting Prism Central) Failure Enable AHV VM node failure HA
Ransomware/Data Loss Use Prism Central backup to AWS S3
High Availability of Prism Central Use Prism Central Scale-Out
Cluster Failure (in a multi cluster deployment) Use Continuous backup
Cluster Failure(in a single cluster deployment) Use Prism Central backup to AWS S3

Note: Backup and restore of Prism Central using 3rd party backup software like HYCU/Veeam or using Protection Domains is not supported and may lead to inconsistent Prism Central post recovery.

General Availability

Prism Central-Backup Restore and Prism Central High Availability are generally available with no additional licensing requirements.

Summary

To summarize, Release 2024.1 will address several resilience use cases with Prism Central. 

  1. High Availability:
    1. Users will have the ability to “Scale-Out,” - enabling them to increase Prism Central VMs to 3 in order to tolerate failure.
    2. Node HA for Prism Central VMs: Automatic respawn on another node upon failure
  2. Backup and Restore
    1. Continuous Backup: Ongoing Prism Central VM backups to 3 clusters, 30 min RPO, 2 hour RTO.
    2. Point-in-Time Backup: Multiple offsite Prism Central backups, 2 hour RPO/RTO, restore ≤1 month, AWS S3 support.

 

© 2024 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo,  and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Kubernetes is a registered trademark of the Linux Foundation. Other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site. Certain information contained in this post may relate to or be based on studies, publications, surveys and other data obtained from third-party sources and our own internal estimates and research. While we believe these third-party studies, publications, surveys and other data are reliable as of the date of this post, they have not independently verified, and we make no representation as to the adequacy, fairness, accuracy, or completeness of any information obtained from third-party sources.

This post may contain express and implied forward-looking statements, which are not historical facts and are instead based on our current expectations, estimates and beliefs. The accuracy of such statements involves risks and uncertainties and depends upon future events, including those that may be beyond our control, and actual results may differ materially and adversely from those anticipated or implied by such statements. Any forward-looking statements included herein speak only as of the date hereof and, except as required by law, we assume no obligation to update or otherwise revise any of such forward-looking statements to reflect subsequent events or circumstances. Any future product or product feature information is intended to outline general product directions, and is not a commitment, promise or legal obligation for Nutanix to deliver any functionality. This information should not be used when making a purchasing decision.