kubeflow

Ethereal Lv4

1. 准备工作

1.0 确定需要下载的版本

1.0.1 确定需要下载的kubeflow版本

例如Kubeflow 1.10 | Kubeflow ,下面明确写了支持版本至少是k8s 1.32。

因此我们下载Kubeflow 1.9 | Kubeflow

其中写了Kustomize版本信息为:

Kustomize 5.2.1

进入1.9版本的branchkubeflow/manifests at v1.9.1-branch ,确认是否有新的小版本发布:发现1.9.1,且具有相同的版本支持信息

1.1 下载kustomize

Releases · kubernetes-sigs/kustomize

1
2
3
wget https://gh-proxy.com/github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2Fv5.6.0/kustomize_v5.6.0_linux_amd64.tar.gz
tar xvf kustomize_v5.6.0_linux_amd64.tar.gz
cp kustomize /usr/local/bin

1.2 下载安装包

1
2
3
git clone git@github.com:kubeflow/manifests.git -b v1.9.1-branch
# wget https://codeload.github.com/kubeflow/manifests/zip/refs/heads/v1.9.1-branch
# unzip v1.9.1-branch

1.3 修改包内容信息

修改storageclass:

1
2
3
4
apps/katib/upstream/components/mysql/pvc.yaml
# common/oidc-client/oidc-authservice/base/pvc.yaml
apps/pipeline/upstream/third-party/minio/base/minio-pvc.yaml
apps/pipeline/upstream/third-party/mysql/base/mysql-pv-claim.yaml
1
2
3
4
5
6
7
8
9
10
11
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mysql-pv-claim
spec:
storageClassName: local-path
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi

修改APP_SECURE_COOKIES:

1
2
3
4
5
vim ./apps/jupyter/jupyter-web-app/upstream/base/params.env
# 修改为JWA_APP_SECURE_COOKIES=false
# 查找其他是否存在
find ./ -type f -exec grep -l "APP_SECURE_COOKIES" {} \;
# 进入查看,一般不需要修改

1.4 设定默认storageclass

以防万一。官方要求是必须设定,但是理论上前述修改storage之后已经不需要设定。

1
2
3
4
5
kubectl get sc
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path (default) rancher.io/local-path Delete WaitForFirstConsumer false 6d17h

1.5 修改nodeport

1
2
find ./ -type f -exec grep -l "istio-ingressgateway" {} \;
vim ./common/istio-1-22/istio-install/base/patches/service.yaml

修改为以下内容:

1
2
3
4
5
6
7
apiVersion: v1
kind: Service
metadata:
name: istio-ingressgateway
namespace: istio-system
spec:
type: NodePort

1.6 修改ui配置文件(也可后改)

此处为Kubeflow的bug

1
vim ./apps/pipeline/upstream/base/pipeline/ml-pipeline-ui-deployment.yaml

增加如下环境变量

1
2
- name: DISABLE_GKE_METADATA
value: "true"

1.7 (废弃)helm部署方案

1.7.1 下载helm配置文件

1
2
git clone git@github.com:alauda/kubeflow-chart.git
# wget https://codeload.github.com/alauda/kubeflow-chart/zip/refs/heads/main

1.7.2 设定默认storageclass

以防万一。官方要求是必须设定,但是理论上前述修改storage之后已经不需要设定。

1
2
kubectl get sc kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' 
kubectl get sc

1.7.3 修改helm配置文件

1
2
3
cd kubeflow-chart/kubeflow-chart-main/charts/kubeflow
vim values.yaml
controlPlane.useRuntimeDocker: false

2. 部署

3492695-20240829105003020-1540822429

参考kubeflow/manifests at v1.9.1-branch

注意可能最新版本的部署命令有修改,下面的不一定准确

1
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

修改ui的bug(如果前面没有修改的话,或者仍然报错)

报错log:

1
FetchError: request to http://metadata/computeMetadata/v1/instance/attributes/cluster-name failed, reason: getaddrinfo ENOTFOUND metadata

页面报错:

1
no healthy upstream

修改方法:

1
2
3
4
kubectl edit deployment ml-pipeline-ui -n kubeflow
# 添加环境变量
- name: DISABLE_GKE_METADATA
value: "true"

3. 访问

查看端口

1
istio-system                istio-ingressgateway                                        NodePort    10.96.178.62    <none>        15021:30446/TCP,80:32287/TCP,443:31098/TCP   34m

浏览器访问,默认账号密码为

1
2
user@example.com
12341234

建议提前测试,并拉取镜像:

1
2
kubeflownotebookswg/jupyter-scipy:v1.9.2

4. 创建新用户

4.1 创建用户配置文件

1
vim profile.yaml

文件如下

1
2
3
4
5
6
7
8
apiVersion: kubeflow.org/v1beta1
kind: Profile
metadata:
name: testuser # replace with the name of profile you want, this will be user's namespace name
spec:
owner:
kind: User
name: testuser@163.com # replace with the email of the user

4.2 生成用户密码

1
2
3
python3 -c 'from passlib.hash import bcrypt; import getpass; print(bcrypt.using(rounds=12, ident="2y").hash(getpass.getpass()))'
Password:
$2y$12$71JAABB8swVwMFEPMQyhHe.lobVFAqKsoXwSZcFAVmuTnZvLBp1YS

4.3 添加新用户

1
2
kubectl apply -f profile.yaml
kubectl edit cm -n auth dex

添加用户信息

1
2
3
4
5
- email: testuser@163.com

hash: $2y$12$71JAABB8swVwMFEPMQyhHe.lobVFAqKsoXwSZcFAVmuTnZvLBp1YS

username: testuser

重启pod

1
kubectl rollout restart deployment dex -n auth

删除用户

1
2
k delete Profile testuser
# 重新改写删除dex的configmap中的user

5. 配置GPU

Installing the NVIDIA GPU Operator — NVIDIA GPU Operator

6. 升级

注意:必须符合k8s版本要求

  • 再次构建并执行部署(参照上面的步骤)

  • 可能需要手动清理旧的组件

参考

Kubeflow 部署

kubeflow部署与主要功能使用方案 - mumong - 博客园

【3】为kubeflow配置默认的StorageClass - 简书

部署Kubeflow版本1.4至Kubernetes集群 - Blog - Silicon Cloud

Installing Kubeflow | Kubeflow

alauda/kubeflow-chart: Kubeflow helm chart

kubeflow部署与主要功能使用方案 - mumong - 博客园

KubeFlow-chart: 灵雀云开源版本 kubeflow-chart,致力于极大程度的降低企业应用 MLOPS 的成本,在 Kubeflow 的基础上,集成 MLFlow,SQLFlow, kfpdist,elyra 等工具,补充 Kubeflow 难以应用之处,构建完整的 MLOPS 开源解决方案。

The CustomResourceDefinition “inferenceservices.serving.kserve.io” is invalid: metadata.annotations: Too long: must have at most 262144 bytes · Issue #2914 · kubeflow/manifests

kubeflow部署与主要功能使用方案 - mumong - 博客园

Releases · kubernetes-sigs/kustomize

Release kustomize/v5.0.3 · kubernetes-sigs/kustomize

KubeFlow-chart: 灵雀云开源版本 kubeflow-chart,致力于极大程度的降低企业应用 MLOPS 的成本,在 Kubeflow 的基础上,集成 MLFlow,SQLFlow, kfpdist,elyra 等工具,补充 Kubeflow 难以应用之处,构建完整的 MLOPS 开源解决方案。

kubeflow国内环境最新安装方式 - 知乎

Kubeflow 国内一键安装文件教程-CSDN博客

kubeflow/manifests at v1.9.1-branch

Repo Issues

Installing Kubeflow | Kubeflow

玩转Kubeflow第一章: kubeflow 国内本地安装及案例介绍-阿里云开发者社区

kubeflow 1.2.0安装-CSDN博客

Kubeflow 1.10 | Kubeflow

Kubeflow 1.9 | Kubeflow

GitHub 文件加速代理 - 快速访问 GitHub 文件

kubeflow创建新用户用户密码-CSDN博客

玩转Kubeflow第一章: kubeflow 国内本地安装及案例介绍-阿里云开发者社区

Installing the NVIDIA GPU Operator — NVIDIA GPU Operator

kubeflow创建新用户用户密码-CSDN博客

ml-pipeline-ui failing on metadata api · Issue #11247 · kubeflow/pipelines

pipelines/frontend/server/configs.ts at master · kubeflow/pipelines

kubeflow/manifests: Kubeflow Deployment Manifests

  • Title: kubeflow
  • Author: Ethereal
  • Created at: 2025-05-14 12:23:58
  • Updated at: 2025-05-14 18:01:10
  • Link: https://ethereal-o.github.io/2025/05/14/kubeflow/
  • License: This work is licensed under CC BY-NC-SA 4.0.
 Comments