Integrations
Integrations are Kubernetes components and external tools that can be used with NVIDIA Run:ai for development, training, orchestration, data access, and monitoring.
Integrations fall into two support levels:
Supported integrations (out of the box) - NVIDIA Run:ai includes built-in support and documentation. You may still need cluster-side installation (for example, install an operator and its CRDs) before you can use the integration.
Community Support integrations - Not supported out of the box, but commonly used with prior customer support experience and reference guides.
Supported Integrations
Frameworks
Dynamo operator
Distributed inference
0.7.0
Dynamo operator is a Kubernetes operator that simplifies the deployment, configuration, and lifecycle management of DynamoGraphs.
NVIDIA Run:ai provides out of the box support for submitting Dynamo workloads via YAML. See Dynamo operator documentation for more details.
NIM operator
Model Serving
3.0.x
The NVIDIA NIM Operator enables Kubernetes cluster administrators to operate the software components and services necessary to deploy NVIDIA NIMs and NVIDIA NeMo microservices in Kubernetes.
NVIDIA Run:ai provides out of the box support for submitting NIM operator workloads via YAML. See NIM operator documentation for more details.
LeaderWorkerSet (LWS)
Distributed inference
0.6.0 or higher
Kubeflow MPI
Distributed training
MPI Operator v0.6.0 or higher
NVIDIA Run:ai provides out of the box support for submitting MPI workloads via API, CLI or UI. See Distributed training for more details.
PyTorch
Distributed training
Kubeflow Training Operator v1.9.2
NVIDIA Run:ai provides out of the box support for submitting PyTorch workloads via API, CLI or UI. See Distributed training for more details.
TensorFlow
Distributed training
Kubeflow Training Operator v1.9.2
NVIDIA Run:ai provides out of the box support for submitting TensorFlow workloads via API, CLI or UI. See Distributed training for more details.
XGBoost
Distributed training
Kubeflow Training Operator v1.9.2
NVIDIA Run:ai provides out of the box support for submitting XGBoost via API, CLI or UI. See Distributed training for more details.
JAX
Distributed training
Kubeflow Training Operator v1.9.2
NVIDIA Run:ai provides out of the box support for submitting JAX workloads via API, CLI or UI. See Distributed training for more details.
Triton
Orchestration
Any version
Usage via docker base image
Development Tools
Jupyter Notebook
Development
NVIDIA Run:ai provides integrated support with Jupyter Notebooks. See Jupyter Notebook quick start example.
PyCharm
Development
Containers created by NVIDIA Run:ai can be accessed via PyCharm.
VScode
Development
Containers created by NVIDIA Run:ai can be accessed via Visual Studio Code. You can automatically launch Visual Studio code web from the NVIDIA Run:ai console.
Storage and Registries
Experiment Tracking and Monitoring
Infrastructure and Cost Optimization
Karpenter
Cost Optimization
NVIDIA Run:ai provides out of the box support for Karpenter to save cloud costs. Integration notes with Karpenter can be found here.
Community Support Integrations
Our Customer Success team has prior experience assisting customers with setup. In many cases, the NVIDIA Enterprise Support Portal may include additional reference documentation provided on an as-is basis.
Apache Airflow
Orchestration
It is possible to schedule Airflow workflows with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Apache Airflow.
Argo workflows
Orchestration
It is possible to schedule Argo workflows with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Argo Workflows.
ClearML
Experiment tracking
It is possible to schedule ClearML workloads with the NVIDIA Run:ai Scheduler.
JupyterHub
Development
It is possible to submit NVIDIA Run:ai workloads via JupyterHub.
Kubeflow notebooks
Development
It is possible to launch a Kubeflow notebook with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Kubeflow.
Kubeflow Pipelines
Orchestration
It is possible to schedule kubeflow pipelines with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Kubeflow.
MLFlow
Model Serving
It is possible to use ML Flow together with the NVIDIA Run:ai Scheduler.
Ray
Training, inference, data processing
It is possible to schedule Ray jobs with the NVIDIA Run:ai Scheduler. Sample code: How to Integrate NVIDIA Run:ai with Ray.
SeldonX
Orchestration
It is possible to schedule Seldon Core workloads with the NVIDIA Run:ai Scheduler.
Spark
Orchestration
It is possible to schedule Spark workflows with the NVIDIA Run:ai Scheduler.
Weights & Biases
Experiment tracking
It is possible to schedule W&B workloads with the NVIDIA Run:ai Scheduler. Sample code: How to integrate with Weights and Biases.
Last updated