In this post, we go through the installation process of Kubeflow, an open source machine learning platform that takes advantage of Kubernetes capabilities to deliver end-to-end workflow to data scientists, ML engineers, and DevOps professionals. The testbed configured in this tutorial will be used for exploring the building blocks of the platform covered in the future installments of this tutorial series.

Kubeflow can be installed on any Kubernetes cluster that has a minimum of 4 CPUs, 50 GB storage, and 12 GB RAM. It can be installed on managed Kubernetes services such as Amazon Web Servicesā€˜ Elastic Kubernetes Service (EKS), Azure Kubernetes Service, Google Kubernetes Engine, and IBM Kubernetes Service. Kubeflow can also be installed in on-prem environments running Kubernetes on bare metal hosts. Refer to the Kubeflow documentation for details on the installation.

For this tutorial, we will use the DeepOps installer from NVIDIA which simplifies the installation process. In about 20 minutes, we will have a fully configured Kubeflow environment available for us. NVIDIA has created DeepOps primarily for installing Kubernetes on a set of hosts with GPUs. But, it can also be used to target non-GPU hosts.

I recently built a custom machine for experimenting with AI. Based on AMD Ryzen Threadripper 3990X CPU with 64 Cores, NVIDIA GeForce RTX 3090 GPU with 24GB and 10496 CUDA Cores, 128GB RAM, and 3TB of NVMe storage, it is a powerhouse. This is a perfect candidate for running a single-node Kubernetes cluster backed by NVIDIA drivers and CUDA Toolkit for GPU access.

#machine learning #kubernetes #kubeflow #nvidia deepops

Install Kubernetes and Kubeflow on a GPU Host with NVIDIA DeepOps
2.60 GEEK