Integrations

Integrations are Kubernetes components and external tools that can be used with NVIDIA Run:ai for development, training, orchestration, data access, and monitoring.

Integrations fall into two support levels:

Supported integrations (out of the box) - NVIDIA Run:ai includes built-in support and documentation. You may still need cluster-side installation (for example, install an operator and its CRDs) before you can use the integration.
Community Support integrations - Not supported out of the box, but commonly used with prior customer support experience and reference guides.

Supported Integrations

Frameworks

Framework

Category

Supported Version

Additional Information

Dynamo operator

Distributed inference

0.7.0

Dynamo operator is a Kubernetes operator that simplifies the deployment, configuration, and lifecycle management of DynamoGraphs.

NVIDIA Run:ai provides out of the box support for submitting Dynamo workloads via YAML. See Dynamo operator documentation for more details.

NIM operator

Model Serving

3.0.x

The NVIDIA NIM Operator enables Kubernetes cluster administrators to operate the software components and services necessary to deploy NVIDIA NIMs and NVIDIA NeMo microservices in Kubernetes.

NVIDIA Run:ai provides out of the box support for submitting NIM operator workloads via YAML. See NIM operator documentation for more details.

LeaderWorkerSet (LWS)

Distributed inference

0.6.0 or higher

NVIDIA Run:ai provides out of the box support for submitting LWS workloads via YAML and NVIDIA Run:ai native distributed inference workloads using LWS via API.

Kubeflow MPI

Distributed training

MPI Operator v0.6.0 or higher

NVIDIA Run:ai provides out of the box support for submitting MPI workloads via API, CLI or UI. See Distributed training for more details.

PyTorch

Distributed training

Kubeflow Training Operator v1.9.2

NVIDIA Run:ai provides out of the box support for submitting PyTorch workloads via API, CLI or UI. See Distributed training for more details.

TensorFlow

Distributed training

Kubeflow Training Operator v1.9.2

NVIDIA Run:ai provides out of the box support for submitting TensorFlow workloads via API, CLI or UI. See Distributed training for more details.

XGBoost

Distributed training

Kubeflow Training Operator v1.9.2

NVIDIA Run:ai provides out of the box support for submitting XGBoost via API, CLI or UI. See Distributed training for more details.

JAX

Distributed training

Kubeflow Training Operator v1.9.2

NVIDIA Run:ai provides out of the box support for submitting JAX workloads via API, CLI or UI. See Distributed training for more details.

Triton

Orchestration

Any version

Usage via docker base image

Development Tools

Tool

Category

Additional Information

Jupyter Notebook

Development

NVIDIA Run:ai provides integrated support with Jupyter Notebooks. See Jupyter Notebook quick start example.

PyCharm

Development

Containers created by NVIDIA Run:ai can be accessed via PyCharm.

VScode

Development

Containers created by NVIDIA Run:ai can be accessed via Visual Studio Code. You can automatically launch Visual Studio code web from the NVIDIA Run:ai console.

Storage and Registries

Tool

Category

Additional Information

Docker Registry

Repositories

NVIDIA Run:ai allows using a docker registry as a Credential asset

GitHub

Storage

NVIDIA Run:ai communicates with GitHub by defining it as a data source asset

Storage

NVIDIA Run:ai communicates with S3 by defining a data source asset

Experiment Tracking and Monitoring

Tool

Category

Additional Information

TensorBoard

Experiment tracking

NVIDIA Run:ai comes with a preset TensorBoard Environment asset

Infrastructure and Cost Optimization

Tool

Category

Additional Information

Karpenter

Cost Optimization

NVIDIA Run:ai provides out of the box support for Karpenter to save cloud costs. Integration notes with Karpenter can be found here.

Community Support Integrations

Our Customer Success team has prior experience assisting customers with setup. In many cases, the NVIDIA Enterprise Support Portal may include additional reference documentation provided on an as-is basis.

Tool

Category

Additional Information

Apache Airflow

Orchestration

It is possible to schedule Airflow workflows with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Apache Airflow.

Argo workflows

Orchestration

It is possible to schedule Argo workflows with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Argo Workflows.

ClearML

Experiment tracking

It is possible to schedule ClearML workloads with the NVIDIA Run:ai Scheduler.

JupyterHub

Development

It is possible to submit NVIDIA Run:ai workloads via JupyterHub.

Kubeflow notebooks

Development

It is possible to launch a Kubeflow notebook with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Kubeflow.

Kubeflow Pipelines

Orchestration

It is possible to schedule kubeflow pipelines with the NVIDIA Run:ai Scheduler. Sample code: How to integrate NVIDIA Run:ai with Kubeflow.

MLFlow

Model Serving

It is possible to use ML Flow together with the NVIDIA Run:ai Scheduler.

Ray

Training, inference, data processing

It is possible to schedule Ray jobs with the NVIDIA Run:ai Scheduler. Sample code: How to Integrate NVIDIA Run:ai with Ray.

SeldonX

Orchestration

It is possible to schedule Seldon Core workloads with the NVIDIA Run:ai Scheduler.

Spark

Orchestration

It is possible to schedule Spark workflows with the NVIDIA Run:ai Scheduler.

Weights & Biases

Experiment tracking

It is possible to schedule W&B workloads with the NVIDIA Run:ai Scheduler. Sample code: How to integrate with Weights and Biases.

PreviousService Mesh NextInterworking with Karpenter

Last updated 11 days ago

Good afternoon

hashtagSupported Integrations

hashtagFrameworks

hashtagDevelopment Tools

hashtagStorage and Registries

hashtagExperiment Tracking and Monitoring

hashtagInfrastructure and Cost Optimization

hashtagCommunity Support Integrations

Supported Integrations

Frameworks

Development Tools

Storage and Registries

Experiment Tracking and Monitoring

Infrastructure and Cost Optimization

Community Support Integrations