2021-10-08
技术笔记
00
请注意,本文编写于 1037 天前,最后修改于 25 天前,其中某些信息可能已经过时。

目录

概览
部署
部署gpu-share scheduler extender
修改scheduler配置
部署device plugin
节点打标签
kubectl扩展
测试

本文分享测试阿里推出的gpushare GPU虚拟化方案过程

概览

https://developer.aliyun.com/article/690623

gpu share scheduler extender https://github.com/AliyunContainerService/gpushare-scheduler-extender
gpu share device plugin

部署

https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/install.md

部署gpu-share scheduler extender

下载scheduler-policy-config,所有master都要做:

bash
cd /etc/kubernetes/ curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/scheduler-policy-config.json

注意:127.0.0.1:32766 这个端口只有在部署了gpushare-scheduler的机器才能访问,要改成service的ip和端口。

部署scheduler-extender:

bash
cd /tmp/ curl -O https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml kubectl create -f gpushare-schd-extender.yaml

修改scheduler配置

所有master都要操作

bash
cp /etc/kubernetes/manifests/kube-scheduler.yaml . # edit kube-scheduler.yaml cp kube-scheduler.yaml /etc/kubernetes/manifests/kube-scheduler.yaml

部署device plugin

bash
wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-rbac.yaml kubectl create -f device-plugin-rbac.yaml wget https://raw.githubusercontent.com/AliyunContainerService/gpushare-device-plugin/master/device-plugin-ds.yaml kubectl create -f device-plugin-ds.yam

节点打标签

bash
kubectl label node k8s-container-group-87-89 nodeType- kubectl label node k8s-container-group-87-89 gpushare=true

kubectl扩展

bash
cd /usr/bin/ wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare chmod u+x /usr/bin/kubectl-inspect-gpushare

测试

参考:https://github.com/AliyunContainerService/gpushare-scheduler-extender/blob/master/docs/userguide.md

本文作者:renbear

本文链接:

版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC 2.0 许可协议。转载请注明出处!