Cluster restore
This section explains how to restore a NVIDIA Run:ai cluster on a different Kubernetes environment.
In the event of a critical Kubernetes failure or alternatively, if you want to migrate a NVIDIA Run:ai cluster to a new Kubernetes environment, simply reinstall the NVIDIA Run:ai cluster. Once you have reinstalled and reconnected the cluster - projects, workloads and other cluster data is synced automatically.
The restoration or back-up of NVIDIA Run:ai cluster Advanced features and Customized deployment configurations which are stored locally on the Kubernetes cluster is optional and they can be restored and backed-up separately.
Backup
As back-up of data is not required, the backup procedure is optional for advanced deployments, as explained above.
Backup cluster configurations
To backup NVIDIA Run:ai cluster configurations:
Run the following command in your terminal:
Once the
runaiconfig_back.yaml
back-up file is created, save the file externally, so that it can be retrieved later.
Restore
Follow the steps below to restore the NVIDIA Run:ai cluster on a new Kubernetes environment.
Prerequisites
Before restoring the NVIDIA Run:ai cluster, it is essential to validate that it is both disconnected and uninstalled.
If the Kubernetes cluster is still available, uninstall the NVIDIA Run:ai cluster - make sure not to remove the cluster from the Control Plane
Navigate to the Cluster page in the NVIDIA Run:ai platform
Search for the cluster, and make sure its status is Disconnected
Re-installing NVIDIA Run:ai cluster
Follow the NVIDIA Run:ai cluster installation instructions and ensure all prerequisites are met
If you have a back-up of the cluster configurations, reload it once the installation is complete
Navigate to the Cluster page in the NVIDIA Run:ai platform
Search for the cluster, and make sure its status is Connected
Last updated